ERROR org.apache.zookeeper.server.persistence.FileTxnSnapLog Parent /cloudera_manager_zookeeper_canary missing for /cloudera_manager_zookeeper_canary/zookeeper1-SERVER-d75e87ec7c989094688ff05ac1e2c2e0 org.apache.zookeeper.server.persistence.FileSnap Reading snapshot /data1/zookeeper/version-2/snapshot.3a001026c6 Unable to load database on disk java.io.IOException: Failed to process transaction type: 1 error: KeeperErrorCode = NoNode for /cloudera_manager_zookeeper_canary at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:188) at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223) at org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:417) at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:409) at org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:156) at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:79) Caused by: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /cloudera_manager_zookeeper_canary at org.apache.zookeeper.server.persistence.FileTxnSnapLog.processTransaction(FileTxnSnapLog.java:250) at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:186) ... 6 more
The log file tells me that the server is not starting up due to it can't read data from "snapshot.3a001026c6". This file "snapshot.3a001026c6" could be corrupted. To clean the corrupted data files and regenerate new files, we need to delete all the files in datadir version-2. But before you do that, make sure all the other servers in your ensemble are up and working. You can use "stat" command to verify that:
# echo "stat" | nc zookeeper1.tony.com 2181 Zookeeper version: 3.4.5-cdh4.5.0--1, built on 11/20/2013 22:29 GMT Clients: /10.6.70.35:60982[1](queued=0,recved=141,sent=141) /10.6.70.2:52908[1](queued=0,recved=148,sent=151) /10.6.70.2:52988[1](queued=0,recved=147,sent=147) /10.6.70.33:44230[1](queued=0,recved=261,sent=269) /10.6.70.3:33691[1](queued=0,recved=272,sent=274) /10.6.70.30:43581[1](queued=0,recved=252,sent=260) /10.6.70.3:35740[0](queued=0,recved=1,sent=0) /10.6.70.3:33639[1](queued=0,recved=345,sent=348) /10.6.70.33:44252[1](queued=0,recved=150,sent=150) /10.6.70.32:34600[1](queued=0,recved=146,sent=146) /10.6.70.30:43695[1](queued=0,recved=141,sent=141) Latency min/avg/max: 0/0/11 Received: 2603 Sent: 2777 Connections: 11 Outstanding: 0 Zxid: 0x3d00000468 Mode: leader Node count: 48
After you have verified that all the other servers of the ensemble are up, you can go ahead and clean the database of the corrupt server. Delete all the files in datadir/version-2 and datalogdir/version-2/. Restart the server.
1 comment:
Thanks! This saved my evening.
Post a Comment