Monday, July 08, 2013

Cloudera Manager - Both Namenodes in standby mode after Cloudera Manager 4.6 upgrade

Recently we upgrade our Cloudera Manger from 4.5.x to 4.6 (now it is called Cloudera Manager Standard). After upgrade, in "HDFS" service, both namenodes are in "Standby" mode. We tried doing the manual failover but that did not work. The log file shows:

Unable to trigger a roll of the active NN
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category JOURNAL is not supported in state standby

We get around this issue by by disabling "Automatic Failover" for the nameservice and restarting the "hdfs" service.Whenever we re-enabled Automatic Failover both the namenode would go to standby mode with the error:

"Failed to initialize High Availability state in ZooKeeper. This might be because a ZNode for this nameservice is already created. Either remove the ZNode, or to reuse the ZNode skip this step and simply start the NameNodes and Failover Controllers. To retry, use the "Initialize High Availability state in ZooKeeper" command available as a Failover Controller action."

The real solution is to delete the ZNode for "hadoop-fs" in Zookeeper. We used the following steps to get it working again.

$ zkCli.sh
[zk: localhost:2181(CONNECTED) 0] ls /hadoop-ha
[nameservice1]
[zk: localhost:2181(CONNECTED) 1] rmr /hadoop-ha


Then go to CM UI, "All service" -> "hdfs1" -> "Instances" -> "ailovercontroller (xx1)" -> "Actions" -> "Initialize Automatic Failover Znode...". Then restart the HDFS service.

No comments: