Friday, June 20, 2014

Hadoop - The hostname and canonical name for this host are not consistent when checked from a Java process

If you seeing the following error message in Hadoop log file:
"The hostname and canonical name for this host are not consistent when checked from a Java process."

That's probably due to the server's hostname and its canonicalized hostname are different. In Hadoop, when datanode starts up, there are four steps to discover the name of the server.

  1. Get the local address (addr = InetAddress.getLocalHost()) and user the retrieved address to determine the hostname of the server.
  2. Get the hostname (addr.getHostName()) and the FQDN / canonicalHostName (addr.getCanonicalHostName())
  3. Set the canonicalized name internally and use it as the official name send to namenode or jobtracker.
  4. Verify that 'hostname' can be used to derive 'canonicalHostName' by using the info in /etc/resolv.conf
    • If /etc/resolv.conf has an entry of the form "domain isp.domain.name", 'hostname'.isp.domain.name should be 'canonicalHostName' for the check to pass.
    • If /etc/resolv.conf has an entry of the form "search domain1.name domain2.name .. domainN.name", 'canonicalHostName' should be one of 'hostname'.domain1.name, 'hostname'.domain2.name .. or 'hostname'.domainN.name

Let's say your hostname is server1 and canonicalHostname is server1.com

You can use the following java program to determine your hostname and canonicalhostname:

import java.net.InetAddress;
import java.net.UnknownHostException;

public class dns {
    public static void main(String[] args) throws UnknownHostException {
        InetAddress addr = InetAddress.getLocalHost();
        System.out.println(
            String.format(
                "IP:%s hostname:%s canonicalName:%s",
                addr.getHostAddress(),         // The "default" IP address
                addr.getHostName(),            // The hostname (from gethostname()
                addr.getCanonicalHostName()    // The canonicalized hostname (from resolver)
            )
        );
    }
}

Here are the things you should do:

  • Check /etc/resolv.conf on the hosts that are not 'healthy'. See what the domain and search lines are. server1.com is not part of /etc/resolv.conf on those hosts and/or different from what it is on the 'healthy' hosts.
  • Look for an INFO message of the form ""hostname <hostname> differs from the canonical name <canonicalname>" in the agent logs (/var/log/cloudera-scm-agent/*.log) on the unhealthy hosts.
  • Change the hostename in etc/sysconfig/network to the FQDN (server1.com) and now it works.

No comments: