How to install a Hadoop pseudo-distributed cluster

1 year ago

Noah Thompson

2 minutes

The Hadoop pseudo-distributed cluster simulates a multi-machine Hadoop cluster environment on a single machine, which can be used for development and testing. Below are the installation steps for the Hadoop pseudo-distributed cluster:

Download and unzip Hadoop
First, download the latest version of the Hadoop compressed file from the official Hadoop website and extract it to the specified directory.
Set up Hadoop environment variables by adding the following configuration to the .bashrc or .bash_profile file.

export HADOOP_HOME=/path/to/hadoop
export PATH=$PATH:$HADOOP_HOME/bin

Set up a Hadoop cluster by going into the conf directory of Hadoop and editing the core-site.xml, hdfs-site.xml, and mapred-site.xml configuration files.

The core-site.xml file:

<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://localhost:9000</value>
    </property>
</configuration>

The configuration file for Hadoop’s Hadoop Distributed File System is named hdfs-site.xml.

<configuration>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
</configuration>

the configuration file mapred-site.xml

<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
</configuration>

Format HDFS
Run the following command to initialize HDFS:

hadoop namenode -format

Start the Hadoop cluster
To start the Hadoop cluster, run the following command:

start-dfs.sh
start-yarn.sh

Verifying the Hadoop cluster
By accessing Hadoop’s web interface in a browser, you can view the status and running condition of the Hadoop cluster. The default address is: http://localhost:50070/

By following the above steps, you can successfully set up and configure a Hadoop pseudo-distributed cluster. In this environment, you can develop and test Hadoop programs.