Differences

This shows you the differences between two versions of the page.


centos_hortonworks [2016/04/26 15:15] (current) – created - external edit 127.0.0.1
Line 1: Line 1:
 +====== CentOS HortonWorks ======
  
 +  * [[CentOS]]
 +  * [[Hadoop]]
 +
 +In this guide, CentOS 6.6 is used, coupled with HortonWorks Data Platform (HDP) 2.1.
 +
 +** Download the minimal ISO **
 +
 +The netinstall ISO is an option, but since the size difference between that and the minimal is negligible, I prefer the minimal one. In addition, the minimal will install some basic system packages.
 +
 +<code>
 +wget http://archive.kernel.org/centos-vault/6.6/isos/x86_64/CentOS-6.6-x86_64-minimal.iso
 +</code>
 +
 +** Boot the ISO in text mode **
 +
 +To make life simpler or if using a headless server, boot in text mode.
 +
 +At the boot menu, hit Tab, and add ''text'' to the kernel options.
 +
 +Alternatively, you can do a [[CentOS Kickstart|kickstart]] install.
 +
 +** Disk partitioning and filesystems **
 +
 +Hadoop comes with some recommendations for setting up the filesystem
 +
 +  * Don't use LVM to manage partitions
 +  * Either do not install swap partition or set ''vm.swappiness'' to 0 in ''sysctl.conf''
 +  * Set the ''noatime'' flag for the partitions
 +  * Use ext3 or ext4 as the filesystem type
 +  * Disable root reserved amount
 +
 +Using the text installer, your partitions are set up automatically. It will install a swap partition, and a separate one for the boot loader.
 +
 +You'll only get an option to partition the drives through the GUI install. So in these cases of a text one, it'll auto-format, use LVM, and create an ext4 filesystem for root.
 +
 +Set ''vm.swappiness'' to 0 in ''/etc/sysctl.conf'', and apply it to the running system. This will let the kernel use swap only if something is going to OOM.
 +
 +** DHCP request **
 +
 +If you didn't do the netinstall, then your server might not get a DHCP address when booting up the first time. First, get a DHCP address for your existing install, assuming your network device is ''eth0'':
 +
 +<code>
 +dhclient eth0
 +</code>
 +
 +** Install packages **
 +
 +Using yum, install some basic packages:
 +
 +<code>
 +yum -y install man wget vim ntp ntpdate chkconfig ntsysv acpid screen sudo bind-utils nano rsync
 +</code>
 +
 +Start services:
 +
 +<code>
 +/etc/init.d/ntp start
 +/etc/init.d/ntpdate start
 +/etc/init.d/acpid start
 +</code>
 +
 +** DHCP client on boot **
 +
 +Edit ''/etc/sysconfig/network-scripts/ifcfg-eth0'' so it will run it on boot:
 +
 +<code>
 +ONBOOT=yes
 +</code>
 +
 +** Disable iptables **
 +
 +Unless needed, disable iptables per HortonWork's recommendation:
 +
 +<code>
 +chkconfig iptables off
 +chkconfig ip6tables off
 +</code>
 +
 +** NTP **
 +
 +It's best to have a Hadoop node in sync with an NTP server so that there is no drift between each server.
 +
 +<code>
 +chkconfig ntp on
 +chkconfig ntpdate on
 +</code>
 +
 +** Max open files and processes **
 +
 +Set the ulimit values for all users on the system. Hadoop will need this since it opens a lot of files and creates a lot of processes. There will be performance impact with the general defaults of 1024.
 +
 +In ''/etc/security/limits.conf'':
 +
 +<code>
 +* - nofile 32768
 +* - nproc 65536
 +</code>
 +
 +** Hostnames **
 +
 +Again, to improve performance for Hadoop, set DNS entries for nodes directly in the ''/etc/hosts''. This saves DNS lookups for the servers.
 +
 +<code>
 +192.168.12.1 hadoop-node1
 +192.168.12.2 hadoop-node2
 +192.168.12.3 hadoop-node3
 +</code>
 +
 +You would also add an entry for the server you are running on.
 +
 +<code>
 +127.0.0.1 localhost
 +192.168.12.1 hadoop-node1
 +</code>
 +
 +Set your server's hostname:
 +
 +<code>
 +hostname hadoop-node1
 +</code>
 +
 +Set the hostname on boot for CentOS. Add this to ''/etc/sysconfig/network'':
 +
 +<code>
 +HOSTNAME=hadoop-node1
 +</code>
 +
 +Hadoop also recommends disabling IPv6:
 +
 +<code>
 +NETWORKING_IPV6=no
 +</code>
 +
 +** Setup SSH pubkeys **
 +
 +For each server, set up an SSH public key without a passphrase for root. Ambari will use it to communicate with the other servers and install packages.
 +
 +<code>
 +ssh-keygen
 +</code>
 +
 +** SELinux **
 +
 +Depending on your install, SELinux may or may not be enabled.
 +
 +Disable it in the running instance:
 +
 +<code>
 +setenforce 0
 +</code>
 +
 +And also disable it when booting in ''/etc/selinux/config'':
 +
 +<code>
 +SELINUX=disabled
 +</code>
 +
 +Note that if you disable it in your running state, and install Ambari and run ''ambari-server setup'', it will think that SE Linux is still enabled. Best to reboot, then, after everything else is complete.
 +
 +** Disable transparent hugepages **
 +
 +HortonWorks recommends disabling this memory setting since it may cause problems with network lookups.
 +
 +Disable it in the running system, and also add to ''/etc/rc.local'' so it's preserved on boot.
 +
 +<code>
 +echo never > /sys/kernel/mm/transparent_hugepage/enabled
 +</code>
 +
 +** Primary node pubkeys **
 +
 +The primary node that has Ambari installed will need it's pubkey installed on all the nodes **including itself**. 
 +
 +<code>
 +ssh-copy-id hadoop-node1
 +ssh-copy-id hadoop-node2
 +ssh-copy-id hadoop-node3
 +</code>
 +
 +Once everything above is done above on all the nodes, you're ready to install Ambari and use it to deploy a Hadoop cluster.
 +
 +** Ambari **
 +
 +Install the Ambari repo, which we'll use to set up the cluster. Ambari only runs on one server (for example, hadoop-node1). We'll use it to install HDP.
 +
 +<code>
 +cd /etc/yum.repos.d
 +wget http://public-repo-1.hortonworks.com/ambari/centos6/1.x/updates/1.6.1/ambari.repo
 +</code>
 +
 +Install the package through yum:
 +
 +<code>
 +yum -y install ambari-server
 +</code>
 +
 +Finally, run through the ambari server setup. It will pull in necessary packages itself. Using the defaults is fine.
 +
 +<code>
 +ambari-server setup
 +</code>
 +
 +Start up the Ambari server:
 +
 +<code>
 +/etc/init.d/ambari-server start
 +</code>
 +
 +And then access your Ambari instance on port 8080 at your server - <nowiki>http://hadoop-node1:8080/</nowiki> The default user and password set by Ambari is ''admin'' and ''admin''.
 +
 +** Ambari host checks **
 +
 +When Ambari sets up the new nodes, it will look through all of them to check for service problems.
 +
 +You can run the check manually from the primary node:
 +
 +<code>
 +python /usr/lib/python2.6/site-packages/ambari_agent/HostCleanup.py --silent --skip=users
 +</code>

Navigation
QR Code
QR Code centos_hortonworks (generated for current page)