Apache ZooKeeper is a highly available service for maintaining small amounts of coordination data, notifying clients of changes in that data, and monitoring clients for failures. The implementation of automatic HDFS failover relies on ZooKeeper for the following things:
- Failure detection: Each of the namenode machines in the cluster maintains a persistent session in ZooKeeper. If the machine crashes, the ZooKeeper session will expire, notifying the other namenode that a failover shuold be trigged.
- Active namenode election - ZooKeeper provides a simple machanism to exclusively elect a node as active. If the current active namenode crashes, another node may take a special exclusive lock in ZooKeeper indicating that it should become the next active.
The ZKFailoverController (ZKFC) is a new component which is a ZooKeeper client which also monitors and manages the state of the namenode. Each of the machines which runs a namenode also runs a ZKFC, and that ZKFC is responsible for:
- Health monitoring - the ZKFC pings its local namenode on a periodic basis with a health-check command (ha.health-monitor.check-interval.ms parameter). So long as the namenode responds in a timely fashion with a healthy status, the ZKFC considers the node healthy.
- ZooKeeper session management - When the local Namenode is healthy, the ZKFC holds a session open in ZooKeeper. If the local Namenode is active, it also holds a special "lock" znode. This lock uses ZooKeeper's support for "ephemeral" nodes; if the session expires, the lock node will be automatically deleted.
- ZooKeeper-based election - if the local namenode is healthy, and the ZKFC sees that no other node currently holds the lock znode, it will itself try to acquire the lock. If it succeeds, then it has "won the election", and is respnsible for running a failover to make it local namenode active. The failover process is similar to the manual failover described above: first, the previous active is fenced if necessary, and then the local namenode transitions to active state.
I assume you already set up a ZooKeeper cluster running on three or more nodes. In my case:
- hadoop1.com - namenode, Journalnode, Zookeeper, resourcemanager, proxyserver, jobhistoryserver
- hadoop2.com - secondary namenode, Journalnode,Zookeeper
- hadoop3.com - datanode, Journalnode, Zookeeper, nodemanager
Shutdown cluster:
It is not currently possible to transistion from a manual failover setup to an automatic failover setup while the cluster is running.
On hadoop3:
[yarn@hadoop3 ~]$ ./stop_nmanager.sh stopping nodemanager [hdfs@hadoop3 ~]$ ./stop_jnode.sh stopping journalnode [hdfs@hadoop3 ~]$ ./stop_datanode.sh stopping datanode
Verify only ZooKeeper process is running
[root@hadoop3 opt]# jps 43082 QuorumPeerMain 43364 Jps
On hadoop2:
[hdfs@hadoop2 ~]$ ./stop_jnode.sh stopping journalnode [hdfs@hadoop2 ~]$ ./stop_namenode.sh stopping namenode
On hadoop1:
[hdfs@hadoop1 ~]$ ./stop_namenode.sh stopping namenode [hdfs@hadoop1 ~]$ ./stop_jnode.sh stopping journalnode [yarn@hadoop1 ~]$ ./stop_rmanager.sh stopping resourcemanager [yarn@hadoop1 ~]$ ./stop_proxyserver.sh stopping proxyserver [mapred@hadoop1 ~]$ ./stop_historyserver.sh stopping historyserver
On all servers, as user root, configure only ZooKeeper process is running.
[root@hadoop1/2/3 opt]# jps 43082 QuorumPeerMain 43376 Jps
Configuration hdfs-site.xml and core-site.xml files update:
Edit hdfs-site.xml file, add:
<property> <name>dfs.ha.automatic-failover.enabled</name> <value>true</value> <description>The cluster should be set up for automatic failover.</description> </property> <property> <name>dfs.client.failover.proxy.provider.testcluster</name> <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value> </property>
Edit core-site.xml file, add:
<property> <name>ha.zookeeper.quorum</name> <value>hadoop1.com:2181,hadoop2.com:2181,hadoop3.com:2181</value> <description>The list of host-port pairs running the ZooKeeper service.</description> </property>
Distribute the new hdfs-site.xml and core-site.xml to all servers (hadoop2 and hadoop3).
Initializing HA state in ZooKeeper:
Initialize require state in ZooKeeper. Run the following command from one of the namenode host (hadoop1):
$ hdfs zkfc -formatZK Java HotSpot(TM) 64-Bit Server VM warning: You have loaded library /opt/hadoop-2.2.0/lib/native/libhadoop.so.1.0.0 which might have disabled stack guard. The VM will try to fix the stack guard now. It's highly recommended that you fix the library with 'execstack -c <libfile>', or link it with '-z noexecstack'. 14/03/10 15:35:53 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 14/03/10 15:35:53 INFO tools.DFSZKFailoverController: Failover controller configured for NameNode NameNode at hadoop1.com/192.168.143.131:8020 14/03/10 15:35:54 INFO zookeeper.ZooKeeper: Client environment:zookeeper.version=3.4.5-1392090, built on 09/30/2012 17:52 GMT 14/03/10 15:35:54 INFO zookeeper.ZooKeeper: Client environment:host.name=hadoop1 14/03/10 15:35:54 INFO zookeeper.ZooKeeper: Client environment:java.version=1.7.0_51 14/03/10 15:35:54 INFO zookeeper.ZooKeeper: Client environment:java.vendor=Oracle Corporation 14/03/10 15:35:54 INFO zookeeper.ZooKeeper: Client environment:java.home=/usr/lib/jvm/jdk1.7.0/jre 14/03/10 15:35:54 INFO zookeeper.ZooKeeper: Client environment:java.class.path=/opt/hadoop-2.2.0/etc/hadoop:/opt/hadoop-2.2.0/share/hadoop/common/lib/jasper-runtime-5.5.23.jar:.../contrib/capacity-scheduler/*.jar 14/03/10 15:35:54 INFO zookeeper.ZooKeeper: Client environment:java.library.path=/opt/hadoop-2.2.0/lib/native 14/03/10 15:35:54 INFO zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/tmp 14/03/10 15:35:54 INFO zookeeper.ZooKeeper: Client environment:java.compiler=<NA> 14/03/10 15:35:54 INFO zookeeper.ZooKeeper: Client environment:os.name=Linux 14/03/10 15:35:54 INFO zookeeper.ZooKeeper: Client environment:os.arch=amd64 14/03/10 15:35:54 INFO zookeeper.ZooKeeper: Client environment:os.version=2.6.32-431.5.1.el6.x86_64 14/03/10 15:35:54 INFO zookeeper.ZooKeeper: Client environment:user.name=hdfs 14/03/10 15:35:54 INFO zookeeper.ZooKeeper: Client environment:user.home=/home/hdfs 14/03/10 15:35:54 INFO zookeeper.ZooKeeper: Client environment:user.dir=/home/hdfs 14/03/10 15:35:54 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=hadoop1.com:2181,hadoop2.com:2181,hadoop3.com:2181 sessionTimeout=5000 watcher=org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef@74b22c1d 14/03/10 15:35:54 INFO zookeeper.ClientCnxn: Opening socket connection to server hadoop2/192.168.143.132:2181. Will not attempt to authenticate using SASL (unknown error) 14/03/10 15:35:54 INFO zookeeper.ClientCnxn: Socket connection established to hadoop2/192.168.143.132:2181, initiating session 14/03/10 15:35:54 INFO zookeeper.ClientCnxn: Session establishment complete on server hadoop2/192.168.143.132:2181, sessionid = 0x144ad5a7ac70001, negotiated timeout = 5000 14/03/10 15:35:54 INFO ha.ActiveStandbyElector: Session connected. 14/03/10 15:35:54 INFO ha.ActiveStandbyElector: Successfully created /hadoop-ha/testcluster in ZK. 14/03/10 15:35:54 INFO zookeeper.ZooKeeper: Session: 0x144ad5a7ac70001 closed 14/03/10 15:35:54 INFO zookeeper.ClientCnxn: EventThread shut down
Check new znode "hadoop-ha" has been created:
$ /opt/zookeeper-3.4.5/bin/zkCli.sh -server hadoop2:2181 Connecting to hadoop2:2181 2014-03-10 15:42:05,071 [myid:] - INFO [main:Environment@100] - Client environment:zookeeper.version=3.4.5-1392090, built on 09/30/2012 17:52 GMT 2014-03-10 15:42:05,077 [myid:] - INFO [main:Environment@100] - Client environment:host.name=hadoop1 2014-03-10 15:42:05,078 [myid:] - INFO [main:Environment@100] - Client environment:java.version=1.7.0_51 2014-03-10 15:42:05,079 [myid:] - INFO [main:Environment@100] - Client environment:java.vendor=Oracle Corporation 2014-03-10 15:42:05,079 [myid:] - INFO [main:Environment@100] - Client environment:java.home=/usr/lib/jvm/jdk1.7.0/jre 2014-03-10 15:42:05,080 [myid:] - INFO [main:Environment@100] - Client environment:java.class.path=/opt/zookeeper-3.4.5/bin/../build/classes:/opt/zookeeper-3.4.5/bin/../build/lib/*.jar:/opt/zookeeper-3.4.5/bin/../lib/slf4j-log4j12-1.6.1.jar:/opt/zookeeper-3.4.5/bin/../lib/slf4j-api-1.6.1.jar:/opt/zookeeper-3.4.5/bin/../lib/netty-3.2.2.Final.jar:/opt/zookeeper-3.4.5/bin/../lib/log4j-1.2.15.jar:/opt/zookeeper-3.4.5/bin/../lib/jline-0.9.94.jar:/opt/zookeeper-3.4.5/bin/../zookeeper-3.4.5.jar:/opt/zookeeper-3.4.5/bin/../src/java/lib/*.jar:/opt/zookeeper-3.4.5/bin/../conf: 2014-03-10 15:42:05,080 [myid:] - INFO [main:Environment@100] - Client environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib 2014-03-10 15:42:05,081 [myid:] - INFO [main:Environment@100] - Client environment:java.io.tmpdir=/tmp 2014-03-10 15:42:05,081 [myid:] - INFO [main:Environment@100] - Client environment:java.compiler=<NA> 2014-03-10 15:42:05,082 [myid:] - INFO [main:Environment@100] - Client environment:os.name=Linux 2014-03-10 15:42:05,082 [myid:] - INFO [main:Environment@100] - Client environment:os.arch=amd64 2014-03-10 15:42:05,083 [myid:] - INFO [main:Environment@100] - Client environment:os.version=2.6.32-431.5.1.el6.x86_64 2014-03-10 15:42:05,083 [myid:] - INFO [main:Environment@100] - Client environment:user.name=hdfs 2014-03-10 15:42:05,084 [myid:] - INFO [main:Environment@100] - Client environment:user.home=/home/hdfs 2014-03-10 15:42:05,084 [myid:] - INFO [main:Environment@100] - Client environment:user.dir=/home/hdfs 2014-03-10 15:42:05,087 [myid:] - INFO [main:ZooKeeper@438] - Initiating client connection, connectString=hadoop2:2181 sessionTimeout=30000 watcher=org.apache.zookeeper.ZooKeeperMain$MyWatcher@37e79b10 Welcome to ZooKeeper! 2014-03-10 15:42:05,123 [myid:] - INFO [main-SendThread(hadoop2:2181):ClientCnxn$SendThread@966] - Opening socket connection to server hadoop2/192.168.143.132:2181. Will not attempt to authenticate using SASL (unknown error) JLine support is enabled 2014-03-10 15:42:05,149 [myid:] - INFO [main-SendThread(hadoop2:2181):ClientCnxn$SendThread@849] - Socket connection established to hadoop2/192.168.143.132:2181, initiating session 2014-03-10 15:42:05,183 [myid:] - INFO [main-SendThread(hadoop2:2181):ClientCnxn$SendThread@1207] - Session establishment complete on server hadoop2/192.168.143.132:2181, sessionid = 0x144ad5a7ac70003, negotiated timeout = 30000 WATCHER:: WatchedEvent state:SyncConnected type:None path:null [zk: hadoop2:2181(CONNECTED) 0] ls / [hadoop-ha, zookeeper]
Start all services:
Start ZooKeeper service on all nodes first (if ZooKeeper service is not running already):
hadoop1/2/3:
[zookeeper@hadoop1 ~]$ zkServer.sh start JMX enabled by default Using config: /opt/zookeeper-3.4.5/bin/../conf/zoo.cfg Starting zookeeper ... STARTED
Use jps command and examine the log file make sure ZooKeeper is running fine on all nodes.
Start namenode on hadoop1 and hadoop2:
[hdfs@hadoop1 ~]$ ./start_namenode.sh [hdfs@hadoop2 ~]$ ./start_namenode.shUse jps command and examine the log file make sure namenode is running fine on all nodes.
Start journalnodes on hadoop1, hadoop2 and hadoop3:
[hdfs@hadoop1 ~]$ ./start_jnode.sh [hdfs@hadoop2 ~]$ ./start_jnode.sh [hdfs@hadoop3 ~]$ ./start_jnode.shUse jps command and examine the log file make sure journalnode is running fine on all nodes.
Start datanode on hadoop3:
[hdfs@hadoop3 ~]$ ./start_datanode.sh starting datanode, logging to /var/log/hadoop//hadoop-hdfs-datanode-hadoop3.com.outUse jps command and examine the log file make sure datanode is running fine on hadoop1 and hadoop2
At this point, both namenode are in "standby" mode.
Start zkfc daemon on hadoop1 and hadoop2:
$ /opt/hadoop-2.2.0/sbin/hadoop-daemon.sh start zkfc starting zkfc, logging to /var/log/hadoop//hadoop-hdfs-zkfc-hadoop1.com.out $ tail -f /var/log/hadoop//hadoop-hdfs-zkfc-hadoop1.com.log 2014-03-10 15:49:18,870 INFO org.apache.hadoop.ipc.Server: Starting Socket Reader #1 for port 8019 2014-03-10 15:49:18,971 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting 2014-03-10 15:49:18,979 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 8019: starting 2014-03-10 15:49:19,258 INFO org.apache.hadoop.ha.HealthMonitor: Entering state SERVICE_HEALTHY 2014-03-10 15:49:19,258 INFO org.apache.hadoop.ha.ZKFailoverController: Local service NameNode at hadoop1.com/192.168.143.131:8020 entered state: SERVICE_HEALTHY 2014-03-10 15:49:19,278 INFO org.apache.hadoop.ha.ActiveStandbyElector: Checking for any old active which needs to be fenced... 2014-03-10 15:49:19,292 INFO org.apache.hadoop.ha.ActiveStandbyElector: No old node to fence 2014-03-10 15:49:19,292 INFO org.apache.hadoop.ha.ActiveStandbyElector: Writing znode /hadoop-ha/testcluster/ActiveBreadCrumb to indicate that the local node is the most recent active... 2014-03-10 15:49:19,297 INFO org.apache.hadoop.ha.ZKFailoverController: Trying to make NameNode at hadoop1.com/192.168.143.131:8020 active... 2014-03-10 15:49:20,886 INFO org.apache.hadoop.ha.ZKFailoverController: Successfully transitioned NameNode at hadoop1.com/192.168.143.131:8020 to active state
Now one of hadoop1 or hadoop2 is in active state.
Start YARN process, proxyserver and jobhistoryserver:
On hadoop1:
[yarn@hadoop1 ~]$ ./start_rmanager.sh [yarn@hadoop1 ~]$ ./start_proxyserver.sh [mapred@hadoop1 ~]$ ./start_historyserver.sh
On hadoop3:
[yarn@hadoop3 ~]$ ./start_nmanager.sh
So far, all processes are up and running:
On hadoop1:
[root@hadoop1 hadoop]# jps 44119 WebAppProxyServer 39452 QuorumPeerMain 43881 ResourceManager 44298 DFSZKFailoverController 40061 JournalNode 44361 Jps 44183 JobHistoryServer 42643 NameNode
On hadoop2:
[root@hadoop2 ~]# jps 28815 QuorumPeerMain 32164 Jps 32023 DFSZKFailoverController 32087 NameNode 29256 JournalNode
On hadoop3:
[root@hadoop3 /]# jps 18702 JournalNode 18853 DataNode 20018 NodeManager 18613 QuorumPeerMain 20124 Jps
Test Automatic Failover:
To test automatic failover, now we are going to shutdown namenode sevice on hadoop1, zkfc should switch hadoop2 namenode serivce into "active".
Shutfown namenode service on hadoop1:
On hadoop1:
[hdfs@hadoop1 ~]$ ./stop_namenode.sh stopping namenode
Job testing:
$ hadoop org.apache.hadoop.examples.Grep grep-input/ grep-output 'of'
$ hadoop fs -cat /user/lxu/grep-output/part-r-00000
1065 of
$ hadoop fs -rm -r /user/lxu/grep-output
No comments:
Post a Comment