Differences

This shows you the differences between two versions of the page.

@@ Line 1: / Line 1: @@
+====== CentOS HortonWorks ======
+  * [[CentOS]]
+  * [[Hadoop]]
+In this guide, CentOS 6.6 is used, coupled with HortonWorks Data Platform (HDP) 2.1.
+** Download the minimal ISO **
+The netinstall ISO is an option, but since the size difference between that and the minimal is negligible, I prefer the minimal one. In addition, the minimal will install some basic system packages.
+<code>
+wget http://archive.kernel.org/centos-vault/6.6/isos/x86_64/CentOS-6.6-x86_64-minimal.iso
+</code>
+** Boot the ISO in text mode **
+To make life simpler or if using a headless server, boot in text mode.
+At the boot menu, hit Tab, and add ''text'' to the kernel options.
+Alternatively, you can do a [[CentOS Kickstart|kickstart]] install.
+** Disk partitioning and filesystems **
+Hadoop comes with some recommendations for setting up the filesystem
+  * Don't use LVM to manage partitions
+  * Either do not install swap partition or set ''vm.swappiness'' to 0 in ''sysctl.conf''
+  * Set the ''noatime'' flag for the partitions
+  * Use ext3 or ext4 as the filesystem type
+  * Disable root reserved amount
+Using the text installer, your partitions are set up automatically. It will install a swap partition, and a separate one for the boot loader.
+You'll only get an option to partition the drives through the GUI install. So in these cases of a text one, it'll auto-format, use LVM, and create an ext4 filesystem for root.
+Set ''vm.swappiness'' to 0 in ''/etc/sysctl.conf'', and apply it to the running system. This will let the kernel use swap only if something is going to OOM.
+** DHCP request **
+If you didn't do the netinstall, then your server might not get a DHCP address when booting up the first time. First, get a DHCP address for your existing install, assuming your network device is ''eth0'':
+<code>
+dhclient eth0
+</code>
+** Install packages **
+Using yum, install some basic packages:
+<code>
+yum -y install man wget vim ntp ntpdate chkconfig ntsysv acpid screen sudo bind-utils nano rsync
+</code>
+Start services:
+<code>
+/etc/init.d/ntp start
+/etc/init.d/ntpdate start
+/etc/init.d/acpid start
+</code>
+** DHCP client on boot **
+Edit ''/etc/sysconfig/network-scripts/ifcfg-eth0'' so it will run it on boot:
+<code>
+ONBOOT=yes
+</code>
+** Disable iptables **
+Unless needed, disable iptables per HortonWork's recommendation:
+<code>
+chkconfig iptables off
+chkconfig ip6tables off
+</code>
+** NTP **
+It's best to have a Hadoop node in sync with an NTP server so that there is no drift between each server.
+<code>
+chkconfig ntp on
+chkconfig ntpdate on
+</code>
+** Max open files and processes **
+Set the ulimit values for all users on the system. Hadoop will need this since it opens a lot of files and creates a lot of processes. There will be performance impact with the general defaults of 1024.
+In ''/etc/security/limits.conf'':
+<code>
+* - nofile 32768
+* - nproc 65536
+</code>
+** Hostnames **
+Again, to improve performance for Hadoop, set DNS entries for nodes directly in the ''/etc/hosts''. This saves DNS lookups for the servers.
+<code>
+.168.12.1 hadoop-node1
+.168.12.2 hadoop-node2
+.168.12.3 hadoop-node3
+</code>
+You would also add an entry for the server you are running on.
+<code>
+.0.0.1 localhost
+.168.12.1 hadoop-node1
+</code>
+Set your server's hostname:
+<code>
+hostname hadoop-node1
+</code>
+Set the hostname on boot for CentOS. Add this to ''/etc/sysconfig/network'':
+<code>
+HOSTNAME=hadoop-node1
+</code>
+Hadoop also recommends disabling IPv6:
+<code>
+NETWORKING_IPV6=no
+</code>
+** Setup SSH pubkeys **
+For each server, set up an SSH public key without a passphrase for root. Ambari will use it to communicate with the other servers and install packages.
+<code>
+ssh-keygen
+</code>
+** SELinux **
+Depending on your install, SELinux may or may not be enabled.
+Disable it in the running instance:
+<code>
+setenforce 0
+</code>
+And also disable it when booting in ''/etc/selinux/config'':
+<code>
+SELINUX=disabled
+</code>
+Note that if you disable it in your running state, and install Ambari and run ''ambari-server setup'', it will think that SE Linux is still enabled. Best to reboot, then, after everything else is complete.
+** Disable transparent hugepages **
+HortonWorks recommends disabling this memory setting since it may cause problems with network lookups.
+Disable it in the running system, and also add to ''/etc/rc.local'' so it's preserved on boot.
+<code>
+echo never > /sys/kernel/mm/transparent_hugepage/enabled
+</code>
+** Primary node pubkeys **
+The primary node that has Ambari installed will need it's pubkey installed on all the nodes **including itself**.
+<code>
+ssh-copy-id hadoop-node1
+ssh-copy-id hadoop-node2
+ssh-copy-id hadoop-node3
+</code>
+Once everything above is done above on all the nodes, you're ready to install Ambari and use it to deploy a Hadoop cluster.
+** Ambari **
+Install the Ambari repo, which we'll use to set up the cluster. Ambari only runs on one server (for example, hadoop-node1). We'll use it to install HDP.
+<code>
+cd /etc/yum.repos.d
+wget http://public-repo-1.hortonworks.com/ambari/centos6/1.x/updates/1.6.1/ambari.repo
+</code>
+Install the package through yum:
+<code>
+yum -y install ambari-server
+</code>
+Finally, run through the ambari server setup. It will pull in necessary packages itself. Using the defaults is fine.
+<code>
+ambari-server setup
+</code>
+Start up the Ambari server:
+<code>
+/etc/init.d/ambari-server start
+</code>
+And then access your Ambari instance on port 8080 at your server - <nowiki>http://hadoop-node1:8080/</nowiki> The default user and password set by Ambari is ''admin'' and ''admin''.
+** Ambari host checks **
+When Ambari sets up the new nodes, it will look through all of them to check for service problems.
+You can run the check manually from the primary node:
+<code>
+python /usr/lib/python2.6/site-packages/ambari_agent/HostCleanup.py --silent --skip=users
+</code>

Trace:

Differences

Navigation

Search

Toolbox

QR Code