How to set up a Hadoop cluster on CentOS 7

Setting up a Hadoop cluster involves the following steps:

  1. Install Java: Install Java on all nodes and set the correct JAVA_HOME environment variable.
  2. Download Hadoop: Download the binary package of Hadoop from the official Apache website and extract it to the same directory on all nodes.
  3. Setting up Hadoop: Navigate to the installation directory of Hadoop, edit the etc/hadoop/core-site.xml file, and add the following configuration:
<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://namenode_hostname:9000</value>  <!-- namenode_hostname为主节点的主机名 -->
    </property>
</configuration>

Then modify the file /etc/hadoop/hdfs-site.xml and add the following configuration:

<configuration>
    <property>
        <name>dfs.replication</name>
        <value>3</value>  <!-- 设置副本数量 -->
    </property>
    <property>
        <name>dfs.namenode.name.dir</name>
        <value>/data/hadoop/dfs/name</value>  <!-- 设置主节点的数据存储路径 -->
    </property>
    <property>
        <name>dfs.datanode.data.dir</name>
        <value>/data/hadoop/dfs/data</value>  <!-- 设置从节点的数据存储路径 -->
    </property>
</configuration>

Finally, edit the etc/hadoop/mapred-site.xml.template file and save it as etc/hadoop/mapred-site.xml, adding the following configuration:

<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
</configuration>
  1. Configure master-slave nodes: Fill in the hostname or IP address of the slave nodes in the etc/hadoop/slaves file, one node per line.
  2. Set up passwordless SSH login: Generate an SSH key pair on the main node and distribute the public key to all nodes, this can be achieved using the ssh-keygen and ssh-copy-id commands.
  3. Start the Hadoop cluster: Execute the following command on the master node to start the Hadoop cluster.
sbin/start-dfs.sh
sbin/start-yarn.sh
  1. The address of the name node is http://namenode_hostname:50070.
  2. The hostname of the name node is followed by port 8088.

The above is a basic process for setting up a Hadoop cluster, specific configurations and adjustments can be modified according to needs.

Leave a Reply 0

Your email address will not be published. Required fields are marked *


广告
Closing in 10 seconds
bannerAds