Table of Contents

hdfs

User Commands

Admin Commands

Working Directory

User home directory is assumed (fex, /user/beandog) unless a specific path is provided.

Removing an HDFS node

For some reason or another, you may need to remove an HDFS node from the cluster. Doing maintenance would be an example. The filesystem needs to be checked to see if there's any issues, then the HDFS NameNode needs to know about its removal.

Before doing anything, get a report about the node to see what it's status is.

hdfs dfsadmin -report
Name: 192.168.12.24:50010 (hadoop-node4.lan)
Hostname: hadoop-node4.lan
Decommission Status : Normal
Configured Capacity: 5487524069376 (4.99 TB)
DFS Used: 9204400128 (8.57 GB)
Non DFS Used: 279243939840 (260.07 GB)
DFS Remaining: 5199075729408 (4.73 TB)
DFS Used%: 0.17%
DFS Remaining%: 94.74%

Next, run a filesystem check to see if there's anything out of the ordinary. You can specify any directory or root partition if desired.

hdfs fsck /user/beandog/work-project
hdfs fsck /

If you want, you can a backup of a directory as well. This will take some time, since the data is stored across multiple nodes, and it has to pull parts of it from each one.

hdfs dfs -copyToLocal /user/beandog/bigdata /home/beandog/bigdata.bak

On the HDFS NameNode (the primary server that keeps track of all the metadata), edit the HDFS configuration file to now exclude that node as a place for storage.

In the hdfs-site.xml file, you'll add a new property, with the name being dfs.hosts.exclude. The value for the property is a file somewhere on the filesystem that has a list of each host that will be decommissioned, one per line.

In this case, the location I'm putting the text file is in /etc/hdfs-removed-nodes.

First, the XML file addition:

<property>
  <name>dfs.hosts.exclude</name>
  <value>/etc/hdfs-removed-nodes</value>
</property>

And the contents of hdfs-removed-nodes:

hadoop-node4.lan

Tell the NameNode to refresh the nodes:

hdfs dfsadmin -refreshNamenodes

The HDFS node will be decomissioned, which will take some time. You can view the status either through the web interface, or using hdfs dfsadmin:

hdfs dfsadmin -report

Once the node is completely decommissioned, you can remove it from the slaves file in your Hadoop configuration directory, and restart HDFS:

stop-dfs.sh
start-dfs.sh