Differences
This shows you the differences between two versions of the page.
— | centos_hortonworks [2016/04/26 15:15] (current) – created - external edit 127.0.0.1 | ||
---|---|---|---|
Line 1: | Line 1: | ||
+ | ====== CentOS HortonWorks ====== | ||
+ | * [[CentOS]] | ||
+ | * [[Hadoop]] | ||
+ | |||
+ | In this guide, CentOS 6.6 is used, coupled with HortonWorks Data Platform (HDP) 2.1. | ||
+ | |||
+ | ** Download the minimal ISO ** | ||
+ | |||
+ | The netinstall ISO is an option, but since the size difference between that and the minimal is negligible, I prefer the minimal one. In addition, the minimal will install some basic system packages. | ||
+ | |||
+ | < | ||
+ | wget http:// | ||
+ | </ | ||
+ | |||
+ | ** Boot the ISO in text mode ** | ||
+ | |||
+ | To make life simpler or if using a headless server, boot in text mode. | ||
+ | |||
+ | At the boot menu, hit Tab, and add '' | ||
+ | |||
+ | Alternatively, | ||
+ | |||
+ | ** Disk partitioning and filesystems ** | ||
+ | |||
+ | Hadoop comes with some recommendations for setting up the filesystem | ||
+ | |||
+ | * Don't use LVM to manage partitions | ||
+ | * Either do not install swap partition or set '' | ||
+ | * Set the '' | ||
+ | * Use ext3 or ext4 as the filesystem type | ||
+ | * Disable root reserved amount | ||
+ | |||
+ | Using the text installer, your partitions are set up automatically. It will install a swap partition, and a separate one for the boot loader. | ||
+ | |||
+ | You'll only get an option to partition the drives through the GUI install. So in these cases of a text one, it'll auto-format, | ||
+ | |||
+ | Set '' | ||
+ | |||
+ | ** DHCP request ** | ||
+ | |||
+ | If you didn't do the netinstall, then your server might not get a DHCP address when booting up the first time. First, get a DHCP address for your existing install, assuming your network device is '' | ||
+ | |||
+ | < | ||
+ | dhclient eth0 | ||
+ | </ | ||
+ | |||
+ | ** Install packages ** | ||
+ | |||
+ | Using yum, install some basic packages: | ||
+ | |||
+ | < | ||
+ | yum -y install man wget vim ntp ntpdate chkconfig ntsysv acpid screen sudo bind-utils nano rsync | ||
+ | </ | ||
+ | |||
+ | Start services: | ||
+ | |||
+ | < | ||
+ | / | ||
+ | / | ||
+ | / | ||
+ | </ | ||
+ | |||
+ | ** DHCP client on boot ** | ||
+ | |||
+ | Edit ''/ | ||
+ | |||
+ | < | ||
+ | ONBOOT=yes | ||
+ | </ | ||
+ | |||
+ | ** Disable iptables ** | ||
+ | |||
+ | Unless needed, disable iptables per HortonWork' | ||
+ | |||
+ | < | ||
+ | chkconfig iptables off | ||
+ | chkconfig ip6tables off | ||
+ | </ | ||
+ | |||
+ | ** NTP ** | ||
+ | |||
+ | It's best to have a Hadoop node in sync with an NTP server so that there is no drift between each server. | ||
+ | |||
+ | < | ||
+ | chkconfig ntp on | ||
+ | chkconfig ntpdate on | ||
+ | </ | ||
+ | |||
+ | ** Max open files and processes ** | ||
+ | |||
+ | Set the ulimit values for all users on the system. Hadoop will need this since it opens a lot of files and creates a lot of processes. There will be performance impact with the general defaults of 1024. | ||
+ | |||
+ | In ''/ | ||
+ | |||
+ | < | ||
+ | * - nofile 32768 | ||
+ | * - nproc 65536 | ||
+ | </ | ||
+ | |||
+ | ** Hostnames ** | ||
+ | |||
+ | Again, to improve performance for Hadoop, set DNS entries for nodes directly in the ''/ | ||
+ | |||
+ | < | ||
+ | 192.168.12.1 hadoop-node1 | ||
+ | 192.168.12.2 hadoop-node2 | ||
+ | 192.168.12.3 hadoop-node3 | ||
+ | </ | ||
+ | |||
+ | You would also add an entry for the server you are running on. | ||
+ | |||
+ | < | ||
+ | 127.0.0.1 localhost | ||
+ | 192.168.12.1 hadoop-node1 | ||
+ | </ | ||
+ | |||
+ | Set your server' | ||
+ | |||
+ | < | ||
+ | hostname hadoop-node1 | ||
+ | </ | ||
+ | |||
+ | Set the hostname on boot for CentOS. Add this to ''/ | ||
+ | |||
+ | < | ||
+ | HOSTNAME=hadoop-node1 | ||
+ | </ | ||
+ | |||
+ | Hadoop also recommends disabling IPv6: | ||
+ | |||
+ | < | ||
+ | NETWORKING_IPV6=no | ||
+ | </ | ||
+ | |||
+ | ** Setup SSH pubkeys ** | ||
+ | |||
+ | For each server, set up an SSH public key without a passphrase for root. Ambari will use it to communicate with the other servers and install packages. | ||
+ | |||
+ | < | ||
+ | ssh-keygen | ||
+ | </ | ||
+ | |||
+ | ** SELinux ** | ||
+ | |||
+ | Depending on your install, SELinux may or may not be enabled. | ||
+ | |||
+ | Disable it in the running instance: | ||
+ | |||
+ | < | ||
+ | setenforce 0 | ||
+ | </ | ||
+ | |||
+ | And also disable it when booting in ''/ | ||
+ | |||
+ | < | ||
+ | SELINUX=disabled | ||
+ | </ | ||
+ | |||
+ | Note that if you disable it in your running state, and install Ambari and run '' | ||
+ | |||
+ | ** Disable transparent hugepages ** | ||
+ | |||
+ | HortonWorks recommends disabling this memory setting since it may cause problems with network lookups. | ||
+ | |||
+ | Disable it in the running system, and also add to ''/ | ||
+ | |||
+ | < | ||
+ | echo never > / | ||
+ | </ | ||
+ | |||
+ | ** Primary node pubkeys ** | ||
+ | |||
+ | The primary node that has Ambari installed will need it's pubkey installed on all the nodes **including itself**. | ||
+ | |||
+ | < | ||
+ | ssh-copy-id hadoop-node1 | ||
+ | ssh-copy-id hadoop-node2 | ||
+ | ssh-copy-id hadoop-node3 | ||
+ | </ | ||
+ | |||
+ | Once everything above is done above on all the nodes, you're ready to install Ambari and use it to deploy a Hadoop cluster. | ||
+ | |||
+ | ** Ambari ** | ||
+ | |||
+ | Install the Ambari repo, which we'll use to set up the cluster. Ambari only runs on one server (for example, hadoop-node1). We'll use it to install HDP. | ||
+ | |||
+ | < | ||
+ | cd / | ||
+ | wget http:// | ||
+ | </ | ||
+ | |||
+ | Install the package through yum: | ||
+ | |||
+ | < | ||
+ | yum -y install ambari-server | ||
+ | </ | ||
+ | |||
+ | Finally, run through the ambari server setup. It will pull in necessary packages itself. Using the defaults is fine. | ||
+ | |||
+ | < | ||
+ | ambari-server setup | ||
+ | </ | ||
+ | |||
+ | Start up the Ambari server: | ||
+ | |||
+ | < | ||
+ | / | ||
+ | </ | ||
+ | |||
+ | And then access your Ambari instance on port 8080 at your server - < | ||
+ | |||
+ | ** Ambari host checks ** | ||
+ | |||
+ | When Ambari sets up the new nodes, it will look through all of them to check for service problems. | ||
+ | |||
+ | You can run the check manually from the primary node: | ||
+ | |||
+ | < | ||
+ | python / | ||
+ | </ |