Wednesday, July 03, 2013

Hadoop - CM4.6.0, DataNode trying to talk to NameNode on port 8022

From Cloudera Manager 4.6.0, it defaults the service RPC address to port 8022, previous versions default to empty, so after you upgraded your CM to 4.6.0, and if you haven't done a full service restart, probably you gonna see the following error:

2013-07-01 17:15:09,351 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Opened IPC server at /172.16.1.29:50020

2013-07-01 17:15:09,363 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Refresh request received for nameservices: null

2013-07-01 17:15:09,385 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Starting BPOfferServices for nameservices: <default>

2013-07-01 17:15:09,394 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Block pool <registering> (storage id unknown) service to nn01/172.16.1.4:8022 starting to offer service

2013-07-01 17:15:09,504 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting

2013-07-01 17:15:09,504 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 50020: starting

2013-07-01 17:15:10,518 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: nn01/172.16.1.4:8022. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)

<after 10 retries>

2013-07-01 17:15:19,531 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Problem connecting to server: nn01/172.16.1.4:8022

To fix this, make sure you did a full service restart after the upgrade.

If after restart you still seeing this error, probably you are hitting this known issue:

Upgrade to 4.6.0 with HA enabled will cause HDFS restarts/failovers to fail.
When upgrading to 4.6.0 with HDFS High Availability enabled, if the service-rpc port is not already configured, the upgrade will change the port value to 8022, which will cause HDFS failover or restart to fail.


Severity: Med
Anticipated Resolution: To be fixed in an upcoming release.
Workaround: For each NameNode, add an entry to the hdfs-site.xml service-wide safety valve that specifies an empty entry for the servicerpc-addess:
<property>
   <name>dfs.namenode.servicerpc-address.{nameservice_name}.{namenode_id}</name>
   <value></value>
</property>
To find Nameservice names and their corresponding NameNode identities, look in the hdfs-site.xml of a NameNode by looking at a property of the form dfs.ha.namenodes.{nameservice_name}. You can find the hdfs-site.xml of a running NameNode by clicking on the NameNode, clicking on the Processes tab, clicking "Show", then clicking on the file name.

No comments: