Saturday, March 15, 2014

Apache Hadoop - How to Configure Namenode HA Automatic Failover

To configure HA automatic failover, you need ZooKeeper and ZKFailoverController (ZKFC).

Apache ZooKeeper is a highly available service for maintaining small amounts of coordination data, notifying clients of changes in that data, and monitoring clients for failures. The implementation of automatic HDFS failover relies on ZooKeeper for the following things:

  • Failure detection: Each of the namenode machines in the cluster maintains a persistent session in ZooKeeper. If the machine crashes, the ZooKeeper session will expire, notifying the other namenode that a failover shuold be trigged.
  • Active namenode election - ZooKeeper provides a simple machanism to exclusively elect a node as active. If the current active namenode crashes, another node may take a special exclusive lock in ZooKeeper indicating that it should become the next active.

The ZKFailoverController (ZKFC) is a new component which is a ZooKeeper client which also monitors and manages the state of the namenode. Each of the machines which runs a namenode also runs a ZKFC, and that ZKFC is responsible for:

  • Health monitoring - the ZKFC pings its local namenode on a periodic basis with a health-check command (ha.health-monitor.check-interval.ms parameter). So long as the namenode responds in a timely fashion with a healthy status, the ZKFC considers the node healthy.
  • ZooKeeper session management - When the local Namenode is healthy, the ZKFC holds a session open in ZooKeeper. If the local Namenode is active, it also holds a special "lock" znode. This lock uses ZooKeeper's support for "ephemeral" nodes; if the session expires, the lock node will be automatically deleted.
  • ZooKeeper-based election - if the local namenode is healthy, and the ZKFC sees that no other node currently holds the lock znode, it will itself try to acquire the lock. If it succeeds, then it has "won the election", and is respnsible for running a failover to make it local namenode active. The failover process is similar to the manual failover described above: first, the previous active is fenced if necessary, and then the local namenode transitions to active state.

I assume you already set up a ZooKeeper cluster running on three or more nodes. In my case:

  • hadoop1.com - namenode, Journalnode, Zookeeper, resourcemanager, proxyserver, jobhistoryserver
  • hadoop2.com - secondary namenode, Journalnode,Zookeeper
  • hadoop3.com - datanode, Journalnode, Zookeeper, nodemanager

Shutdown cluster:
It is not currently possible to transistion from a manual failover setup to an automatic failover setup while the cluster is running.
On hadoop3:
[yarn@hadoop3 ~]$ ./stop_nmanager.sh
stopping nodemanager
[hdfs@hadoop3 ~]$ ./stop_jnode.sh
stopping journalnode
[hdfs@hadoop3 ~]$ ./stop_datanode.sh
stopping datanode

Verify only ZooKeeper process is running
[root@hadoop3 opt]# jps
43082 QuorumPeerMain
43364 Jps

On hadoop2:
[hdfs@hadoop2 ~]$ ./stop_jnode.sh
stopping journalnode
[hdfs@hadoop2 ~]$ ./stop_namenode.sh
stopping namenode

On hadoop1:
[hdfs@hadoop1 ~]$ ./stop_namenode.sh
stopping namenode
[hdfs@hadoop1 ~]$ ./stop_jnode.sh
stopping journalnode
[yarn@hadoop1 ~]$ ./stop_rmanager.sh
stopping resourcemanager
[yarn@hadoop1 ~]$ ./stop_proxyserver.sh
stopping proxyserver
[mapred@hadoop1 ~]$ ./stop_historyserver.sh
stopping historyserver

On all servers, as user root, configure only ZooKeeper process is running.
[root@hadoop1/2/3 opt]# jps
43082 QuorumPeerMain
43376 Jps

Configuration hdfs-site.xml and core-site.xml files update:
Edit hdfs-site.xml file, add:
<property>
   <name>dfs.ha.automatic-failover.enabled</name>
   <value>true</value>
   <description>The cluster should be set up for automatic failover.</description>
</property>

<property>
    <name>dfs.client.failover.proxy.provider.testcluster</name>
    <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>

Edit core-site.xml file, add:
<property>
   <name>ha.zookeeper.quorum</name>
   <value>hadoop1.com:2181,hadoop2.com:2181,hadoop3.com:2181</value>
   <description>The list of host-port pairs running the ZooKeeper service.</description>
</property>

Distribute the new hdfs-site.xml and core-site.xml to all servers (hadoop2 and hadoop3).

Initializing HA state in ZooKeeper:
Initialize require state in ZooKeeper. Run the following command from one of the namenode host (hadoop1):
$ hdfs zkfc -formatZK
Java HotSpot(TM) 64-Bit Server VM warning: You have loaded library /opt/hadoop-2.2.0/lib/native/libhadoop.so.1.0.0 which might have disabled stack guard. The VM will try to fix the stack guard now.
It's highly recommended that you fix the library with 'execstack -c <libfile>', or link it with '-z noexecstack'.
14/03/10 15:35:53 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
14/03/10 15:35:53 INFO tools.DFSZKFailoverController: Failover controller configured for NameNode NameNode at hadoop1.com/192.168.143.131:8020
14/03/10 15:35:54 INFO zookeeper.ZooKeeper: Client environment:zookeeper.version=3.4.5-1392090, built on 09/30/2012 17:52 GMT
14/03/10 15:35:54 INFO zookeeper.ZooKeeper: Client environment:host.name=hadoop1
14/03/10 15:35:54 INFO zookeeper.ZooKeeper: Client environment:java.version=1.7.0_51
14/03/10 15:35:54 INFO zookeeper.ZooKeeper: Client environment:java.vendor=Oracle Corporation
14/03/10 15:35:54 INFO zookeeper.ZooKeeper: Client environment:java.home=/usr/lib/jvm/jdk1.7.0/jre
14/03/10 15:35:54 INFO zookeeper.ZooKeeper: Client environment:java.class.path=/opt/hadoop-2.2.0/etc/hadoop:/opt/hadoop-2.2.0/share/hadoop/common/lib/jasper-runtime-5.5.23.jar:.../contrib/capacity-scheduler/*.jar
14/03/10 15:35:54 INFO zookeeper.ZooKeeper: Client environment:java.library.path=/opt/hadoop-2.2.0/lib/native
14/03/10 15:35:54 INFO zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/tmp
14/03/10 15:35:54 INFO zookeeper.ZooKeeper: Client environment:java.compiler=<NA>
14/03/10 15:35:54 INFO zookeeper.ZooKeeper: Client environment:os.name=Linux
14/03/10 15:35:54 INFO zookeeper.ZooKeeper: Client environment:os.arch=amd64
14/03/10 15:35:54 INFO zookeeper.ZooKeeper: Client environment:os.version=2.6.32-431.5.1.el6.x86_64
14/03/10 15:35:54 INFO zookeeper.ZooKeeper: Client environment:user.name=hdfs
14/03/10 15:35:54 INFO zookeeper.ZooKeeper: Client environment:user.home=/home/hdfs
14/03/10 15:35:54 INFO zookeeper.ZooKeeper: Client environment:user.dir=/home/hdfs
14/03/10 15:35:54 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=hadoop1.com:2181,hadoop2.com:2181,hadoop3.com:2181 sessionTimeout=5000 watcher=org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef@74b22c1d
14/03/10 15:35:54 INFO zookeeper.ClientCnxn: Opening socket connection to server hadoop2/192.168.143.132:2181. Will not attempt to authenticate using SASL (unknown error)
14/03/10 15:35:54 INFO zookeeper.ClientCnxn: Socket connection established to hadoop2/192.168.143.132:2181, initiating session
14/03/10 15:35:54 INFO zookeeper.ClientCnxn: Session establishment complete on server hadoop2/192.168.143.132:2181, sessionid = 0x144ad5a7ac70001, negotiated timeout = 5000
14/03/10 15:35:54 INFO ha.ActiveStandbyElector: Session connected.
14/03/10 15:35:54 INFO ha.ActiveStandbyElector: Successfully created /hadoop-ha/testcluster in ZK.
14/03/10 15:35:54 INFO zookeeper.ZooKeeper: Session: 0x144ad5a7ac70001 closed
14/03/10 15:35:54 INFO zookeeper.ClientCnxn: EventThread shut down

Check new znode "hadoop-ha" has been created:
$ /opt/zookeeper-3.4.5/bin/zkCli.sh -server hadoop2:2181
Connecting to hadoop2:2181
2014-03-10 15:42:05,071 [myid:] - INFO  [main:Environment@100] - Client environment:zookeeper.version=3.4.5-1392090, built on 09/30/2012 17:52 GMT
2014-03-10 15:42:05,077 [myid:] - INFO  [main:Environment@100] - Client environment:host.name=hadoop1
2014-03-10 15:42:05,078 [myid:] - INFO  [main:Environment@100] - Client environment:java.version=1.7.0_51
2014-03-10 15:42:05,079 [myid:] - INFO  [main:Environment@100] - Client environment:java.vendor=Oracle Corporation
2014-03-10 15:42:05,079 [myid:] - INFO  [main:Environment@100] - Client environment:java.home=/usr/lib/jvm/jdk1.7.0/jre
2014-03-10 15:42:05,080 [myid:] - INFO  [main:Environment@100] - Client environment:java.class.path=/opt/zookeeper-3.4.5/bin/../build/classes:/opt/zookeeper-3.4.5/bin/../build/lib/*.jar:/opt/zookeeper-3.4.5/bin/../lib/slf4j-log4j12-1.6.1.jar:/opt/zookeeper-3.4.5/bin/../lib/slf4j-api-1.6.1.jar:/opt/zookeeper-3.4.5/bin/../lib/netty-3.2.2.Final.jar:/opt/zookeeper-3.4.5/bin/../lib/log4j-1.2.15.jar:/opt/zookeeper-3.4.5/bin/../lib/jline-0.9.94.jar:/opt/zookeeper-3.4.5/bin/../zookeeper-3.4.5.jar:/opt/zookeeper-3.4.5/bin/../src/java/lib/*.jar:/opt/zookeeper-3.4.5/bin/../conf:
2014-03-10 15:42:05,080 [myid:] - INFO  [main:Environment@100] - Client environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
2014-03-10 15:42:05,081 [myid:] - INFO  [main:Environment@100] - Client environment:java.io.tmpdir=/tmp
2014-03-10 15:42:05,081 [myid:] - INFO  [main:Environment@100] - Client environment:java.compiler=<NA>
2014-03-10 15:42:05,082 [myid:] - INFO  [main:Environment@100] - Client environment:os.name=Linux
2014-03-10 15:42:05,082 [myid:] - INFO  [main:Environment@100] - Client environment:os.arch=amd64
2014-03-10 15:42:05,083 [myid:] - INFO  [main:Environment@100] - Client environment:os.version=2.6.32-431.5.1.el6.x86_64
2014-03-10 15:42:05,083 [myid:] - INFO  [main:Environment@100] - Client environment:user.name=hdfs
2014-03-10 15:42:05,084 [myid:] - INFO  [main:Environment@100] - Client environment:user.home=/home/hdfs
2014-03-10 15:42:05,084 [myid:] - INFO  [main:Environment@100] - Client environment:user.dir=/home/hdfs
2014-03-10 15:42:05,087 [myid:] - INFO  [main:ZooKeeper@438] - Initiating client connection, connectString=hadoop2:2181 sessionTimeout=30000 watcher=org.apache.zookeeper.ZooKeeperMain$MyWatcher@37e79b10
Welcome to ZooKeeper!
2014-03-10 15:42:05,123 [myid:] - INFO  [main-SendThread(hadoop2:2181):ClientCnxn$SendThread@966] - Opening socket connection to server hadoop2/192.168.143.132:2181. Will not attempt to authenticate using SASL (unknown error)
JLine support is enabled
2014-03-10 15:42:05,149 [myid:] - INFO  [main-SendThread(hadoop2:2181):ClientCnxn$SendThread@849] - Socket connection established to hadoop2/192.168.143.132:2181, initiating session
2014-03-10 15:42:05,183 [myid:] - INFO  [main-SendThread(hadoop2:2181):ClientCnxn$SendThread@1207] - Session establishment complete on server hadoop2/192.168.143.132:2181, sessionid = 0x144ad5a7ac70003, negotiated timeout = 30000

WATCHER::

WatchedEvent state:SyncConnected type:None path:null
[zk: hadoop2:2181(CONNECTED) 0] ls /
[hadoop-ha, zookeeper]

Start all services:
Start ZooKeeper service on all nodes first (if ZooKeeper service is not running already):
hadoop1/2/3:
[zookeeper@hadoop1 ~]$ zkServer.sh start
JMX enabled by default
Using config: /opt/zookeeper-3.4.5/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED

Use jps command and examine the log file make sure ZooKeeper is running fine on all nodes.

Start namenode on hadoop1 and hadoop2:
[hdfs@hadoop1 ~]$ ./start_namenode.sh
[hdfs@hadoop2 ~]$ ./start_namenode.sh

Use jps command and examine the log file make sure namenode is running fine on all nodes.

Start journalnodes on hadoop1, hadoop2 and hadoop3:
[hdfs@hadoop1 ~]$ ./start_jnode.sh
[hdfs@hadoop2 ~]$ ./start_jnode.sh
[hdfs@hadoop3 ~]$ ./start_jnode.sh

Use jps command and examine the log file make sure journalnode is running fine on all nodes.

Start datanode on hadoop3:
[hdfs@hadoop3 ~]$ ./start_datanode.sh
starting datanode, logging to /var/log/hadoop//hadoop-hdfs-datanode-hadoop3.com.out

Use jps command and examine the log file make sure datanode is running fine on hadoop1 and hadoop2

At this point, both namenode are in "standby" mode.

Start zkfc daemon on hadoop1 and hadoop2:
$ /opt/hadoop-2.2.0/sbin/hadoop-daemon.sh start zkfc
starting zkfc, logging to /var/log/hadoop//hadoop-hdfs-zkfc-hadoop1.com.out
$ tail -f /var/log/hadoop//hadoop-hdfs-zkfc-hadoop1.com.log
2014-03-10 15:49:18,870 INFO org.apache.hadoop.ipc.Server: Starting Socket Reader #1 for port 8019
2014-03-10 15:49:18,971 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting
2014-03-10 15:49:18,979 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 8019: starting
2014-03-10 15:49:19,258 INFO org.apache.hadoop.ha.HealthMonitor: Entering state SERVICE_HEALTHY
2014-03-10 15:49:19,258 INFO org.apache.hadoop.ha.ZKFailoverController: Local service NameNode at hadoop1.com/192.168.143.131:8020 entered state: SERVICE_HEALTHY
2014-03-10 15:49:19,278 INFO org.apache.hadoop.ha.ActiveStandbyElector: Checking for any old active which needs to be fenced...
2014-03-10 15:49:19,292 INFO org.apache.hadoop.ha.ActiveStandbyElector: No old node to fence
2014-03-10 15:49:19,292 INFO org.apache.hadoop.ha.ActiveStandbyElector: Writing znode /hadoop-ha/testcluster/ActiveBreadCrumb to indicate that the local node is the most recent active...
2014-03-10 15:49:19,297 INFO org.apache.hadoop.ha.ZKFailoverController: Trying to make NameNode at hadoop1.com/192.168.143.131:8020 active...
2014-03-10 15:49:20,886 INFO org.apache.hadoop.ha.ZKFailoverController: Successfully transitioned NameNode at hadoop1.com/192.168.143.131:8020 to active state

Now one of hadoop1 or hadoop2 is in active state.

Start YARN process, proxyserver and jobhistoryserver:
On hadoop1:
[yarn@hadoop1 ~]$ ./start_rmanager.sh
[yarn@hadoop1 ~]$ ./start_proxyserver.sh
[mapred@hadoop1 ~]$ ./start_historyserver.sh

On hadoop3:
[yarn@hadoop3 ~]$ ./start_nmanager.sh

So far, all processes are up and running:
On hadoop1:
[root@hadoop1 hadoop]# jps
44119 WebAppProxyServer
39452 QuorumPeerMain
43881 ResourceManager
44298 DFSZKFailoverController
40061 JournalNode
44361 Jps
44183 JobHistoryServer
42643 NameNode

On hadoop2:
[root@hadoop2 ~]# jps
28815 QuorumPeerMain
32164 Jps
32023 DFSZKFailoverController
32087 NameNode
29256 JournalNode

On hadoop3:
[root@hadoop3 /]# jps
18702 JournalNode
18853 DataNode
20018 NodeManager
18613 QuorumPeerMain
20124 Jps

Test Automatic Failover:
To test automatic failover, now we are going to shutdown namenode sevice on hadoop1, zkfc should switch hadoop2 namenode serivce into "active".

Shutfown namenode service on hadoop1:
On hadoop1:
[hdfs@hadoop1 ~]$ ./stop_namenode.sh
stopping namenode

Job testing:
$ hadoop org.apache.hadoop.examples.Grep grep-input/ grep-output 'of'
$ hadoop fs -cat /user/lxu/grep-output/part-r-00000
1065 of
$ hadoop fs -rm -r /user/lxu/grep-output



No comments: