====== HDFS (Hadoop Distributed File System) ====== * [[Hadoop]] * [[hdfs]] ** NameNode ** - master server that manages the file system namespace and regulates access to files by clients ==== About ==== From the documentation: HDFS is the primary distributed storage used by Hadoop applications. A HDFS cluster primarily consists of a NameNode that manages the file system metadata and DataNodes that store the actual data. HDFS is highly fault-tolerant and is designed to be deployed on low-cost hardware. HDFS provides high throughput access to application data and is suitable for applications that have large data sets. HDFS relaxes a few POSIX requirements to enable streaming access to file system data.