How to set up the main node of a Hadoop cluster?
The following steps are required to set up the main node of the Hadoop cluster:
- Install Hadoop: Install the Hadoop software package on the main node. You can download the latest stable version from the official Hadoop website.
- Configure the master node by making necessary modifications in the Hadoop configuration files. Mainly modify the following configuration files:
- core-site.xml: Configuring core settings for Hadoop, such as the default URI for the file system and the name of the Hadoop cluster.
- hdfs-site.xml: Configures settings for the Hadoop Distributed File System (HDFS), such as the number of replicas and the size of data blocks.
- mapred-site.xml is a configuration file for Hadoop MapReduce that allows you to set parameters such as the type of task scheduler and the degree of task parallelism.
- yarn-site.xml configures the resource manager and application manager of Hadoop, such as the resource scheduler type and resource allocation for node managers.
- Enable SSH passwordless login: Make sure that the main node can log in to other nodes using SSH without a password. This can be achieved by adding the main node’s public key to the authorized_keys file of the other nodes.
- Configure the secondary nodes of the Hadoop cluster by editing the hadoop/etc/hadoop/slaves file on the master node, and listing the hostnames or IP addresses of the secondary nodes. Each secondary node should be on a new line.
- Start the Hadoop cluster: Run the following command on the master node to start the Hadoop cluster:
- Initialize Hadoop Namenode, start Hadoop distributed file system (DFS), and start YARN.
- This will format HDFS and start HDFS and YARN.
- Verify the cluster configuration: Open the Hadoop administration interface on the main node in a browser (http://
:50070) to confirm if the cluster configuration is correct. You can view the status information of nodes, blocks, and tasks.
After the setup is complete, the master node of the Hadoop cluster will be ready to receive tasks and manage the resources of the entire cluster.