Differences

This shows you the differences between two versions of the page.


hdfs [2016/04/21 14:32] (current) – created - external edit 127.0.0.1
Line 1: Line 1:
 +====== hdfs ======
  
 +  * [[Hadoop]]
 +  * [[HDFS Filesystem]]
 +  * [[hdfs dfs]]
 +  * [[webhdfs]]
 +
 +  * [[http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html|HDFS User Guide]]
 +  * [[http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HDFSCommands.html|HDFS Commands Guide]]
 +  * [[http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsSnapshots.html|HDFS Snapshots]]
 +
 +  * [[https://developer.yahoo.com/hadoop/tutorial/module2.html#commandref|HDFS Command Reference]]
 +
 +==== User Commands ====
 +
 +  * [[hdfs archive]]
 +  * [[hdfs distcp]]
 +  * [[hdfs dfs]]
 +  * [[hdfs fs]]
 +  * [[hdfs fsck]]
 +  * [[hdfs fetchdt]]
 +
 +==== Admin Commands ====
 +
 +  * [[hdfs balancer]]
 +  * [[hdfs daemonlog]]
 +  * [[hdfs datanode]]
 +  * [[hdfs dfsadmin]]
 +  * [[hdfs namenode]]
 +  * [[hdfs secondarynamenode]]
 +
 +=== Working Directory ===
 +
 +User home directory is assumed (fex, ''/user/beandog'') unless a specific path is provided.
 +
 +=== Removing an HDFS node ===
 +
 +For some reason or another, you may need to remove an HDFS node from the cluster. Doing maintenance would be an example. The filesystem needs to be checked to see if there's any issues, then the HDFS NameNode needs to know about its removal.
 +
 +Before doing anything, get a report about the node to see what it's status is.
 +
 +<code>
 +hdfs dfsadmin -report
 +</code>
 +
 +<code>
 +Name: 192.168.12.24:50010 (hadoop-node4.lan)
 +Hostname: hadoop-node4.lan
 +Decommission Status : Normal
 +Configured Capacity: 5487524069376 (4.99 TB)
 +DFS Used: 9204400128 (8.57 GB)
 +Non DFS Used: 279243939840 (260.07 GB)
 +DFS Remaining: 5199075729408 (4.73 TB)
 +DFS Used%: 0.17%
 +DFS Remaining%: 94.74%
 +</code>
 +
 +Next, run a filesystem check to see if there's anything out of the ordinary. You can specify any directory or root partition if desired.
 +
 +<code>
 +hdfs fsck /user/beandog/work-project
 +hdfs fsck /
 +</code>
 +
 +If you want, you can a backup of a directory as well. This will take some time, since the data is stored across multiple nodes, and it has to pull parts of it from each one.
 +
 +<code>
 +hdfs dfs -copyToLocal /user/beandog/bigdata /home/beandog/bigdata.bak
 +</code>
 +
 +On the HDFS NameNode (the primary server that keeps track of all the metadata), edit the HDFS configuration file to now exclude that node as a place for storage.
 +
 +In the ''hdfs-site.xml'' file, you'll add a new property, with the name being ''dfs.hosts.exclude''. The value for the property is a file somewhere on the filesystem that has a list of each host that will be decommissioned, one per line.
 +
 +In this case, the location I'm putting the text file is in ''/etc/hdfs-removed-nodes''.
 +
 +First, the XML file addition:
 +
 +<code>
 +<property>
 +  <name>dfs.hosts.exclude</name>
 +  <value>/etc/hdfs-removed-nodes</value>
 +</property>
 +</code>
 +
 +And the contents of hdfs-removed-nodes:
 +
 +<code>
 +hadoop-node4.lan
 +</code>
 +
 +Tell the NameNode to refresh the nodes:
 +
 +<code>
 +hdfs dfsadmin -refreshNamenodes
 +</code>
 +
 +The HDFS node will be decomissioned, which will take some time. You can view the status either through the web interface, or using ''hdfs dfsadmin'':
 +
 +<code>
 +hdfs dfsadmin -report
 +</code>
 +
 +Once the node is completely decommissioned, you can remove it from the ''slaves'' file in your Hadoop configuration directory, and restart HDFS:
 +
 +<code>
 +stop-dfs.sh
 +start-dfs.sh
 +</code>

Navigation
QR Code
QR Code hdfs (generated for current page)