Table of Contents
hdfs
User Commands
Admin Commands
Working Directory
User home directory is assumed (fex, /user/beandog
) unless a specific path is provided.
Removing an HDFS node
For some reason or another, you may need to remove an HDFS node from the cluster. Doing maintenance would be an example. The filesystem needs to be checked to see if there's any issues, then the HDFS NameNode needs to know about its removal.
Before doing anything, get a report about the node to see what it's status is.
hdfs dfsadmin -report
Name: 192.168.12.24:50010 (hadoop-node4.lan) Hostname: hadoop-node4.lan Decommission Status : Normal Configured Capacity: 5487524069376 (4.99 TB) DFS Used: 9204400128 (8.57 GB) Non DFS Used: 279243939840 (260.07 GB) DFS Remaining: 5199075729408 (4.73 TB) DFS Used%: 0.17% DFS Remaining%: 94.74%
Next, run a filesystem check to see if there's anything out of the ordinary. You can specify any directory or root partition if desired.
hdfs fsck /user/beandog/work-project hdfs fsck /
If you want, you can a backup of a directory as well. This will take some time, since the data is stored across multiple nodes, and it has to pull parts of it from each one.
hdfs dfs -copyToLocal /user/beandog/bigdata /home/beandog/bigdata.bak
On the HDFS NameNode (the primary server that keeps track of all the metadata), edit the HDFS configuration file to now exclude that node as a place for storage.
In the hdfs-site.xml
file, you'll add a new property, with the name being dfs.hosts.exclude
. The value for the property is a file somewhere on the filesystem that has a list of each host that will be decommissioned, one per line.
In this case, the location I'm putting the text file is in /etc/hdfs-removed-nodes
.
First, the XML file addition:
<property> <name>dfs.hosts.exclude</name> <value>/etc/hdfs-removed-nodes</value> </property>
And the contents of hdfs-removed-nodes:
hadoop-node4.lan
Tell the NameNode to refresh the nodes:
hdfs dfsadmin -refreshNamenodes
The HDFS node will be decomissioned, which will take some time. You can view the status either through the web interface, or using hdfs dfsadmin
:
hdfs dfsadmin -report
Once the node is completely decommissioned, you can remove it from the slaves
file in your Hadoop configuration directory, and restart HDFS:
stop-dfs.sh start-dfs.sh