How to install a Hadoop pseudo-distributed cluster
The Hadoop pseudo-distributed cluster simulates a multi-machine Hadoop cluster environment on a single machine, which can be used for development and testing. Below are the installation steps for the Hadoop pseudo-distributed cluster:
- Download and unzip Hadoop
First, download the latest version of the Hadoop compressed file from the official Hadoop website and extract it to the specified directory. - Set up Hadoop environment variables by adding the following configuration to the .bashrc or .bash_profile file.
export HADOOP_HOME=/path/to/hadoop
export PATH=$PATH:$HADOOP_HOME/bin
- Set up a Hadoop cluster by going into the conf directory of Hadoop and editing the core-site.xml, hdfs-site.xml, and mapred-site.xml configuration files.
- The core-site.xml file:
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
- The configuration file for Hadoop’s Hadoop Distributed File System is named hdfs-site.xml.
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
- the configuration file mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
- Format HDFS
Run the following command to initialize HDFS:
hadoop namenode -format
- Start the Hadoop cluster
To start the Hadoop cluster, run the following command:
start-dfs.sh
start-yarn.sh
- Verifying the Hadoop cluster
By accessing Hadoop’s web interface in a browser, you can view the status and running condition of the Hadoop cluster. The default address is: http://localhost:50070/
By following the above steps, you can successfully set up and configure a Hadoop pseudo-distributed cluster. In this environment, you can develop and test Hadoop programs.