Tuesday, October 01, 2013

Impala - Could not initialize class org.apache.hadoop.hbase.util.Classes


Error: Impala query gives error "Could not initialize class org.apache.hadoop.hbase.util.Classes"
Full log: http://paste.ubuntu.com/6179531/
Cloudera Jira: https://issues.cloudera.org/browse/IMPALA-609

Recently after upgraded Impala to IMPALA 1.1.1-1.p0.17 and CDH to CDH 4.4.0-1.cdh4.4.0.p0.39, while running a Impala query I got the following error:

Query: "> select name from product where manufacturername = "Cat Store" limit 1;"

Error from the terminal:
Fri Sep 27 13:42:00 EDT 2013, org.apache.hadoop.hbase.client.ScannerCallable@789df8c, org.apache.hadoop.ipc.RemoteException(java.lang.NoClassDefFoundError): IPC server unable to read call parameters: Could not initialize class org.apache.hadoop.hbase.util.Classes
Error from the Impala log file:

I0927 13:42:00.222398 21128 impala-server.cc:1081] Cancel(): query_id=5c4a0b605728cee4:f033de8e5d84d591

I0927 13:42:00.223323 21128 status.cc:44] Invalid or unknown query handle
    @           0x83af7d  (unknown)
    @           0x698a6e  (unknown)
    @           0x6e607d  (unknown)
    @           0x84e595  (unknown)
    @           0x84b04f  (unknown)
    @           0x6aac5e  (unknown)
    @          0x126aca9  (unknown)
    @          0x125becf  (unknown)
    @          0x125de14  (unknown)
    @          0x1270c52  (unknown)
    @       0x3b6d807851  (unknown)
    @       0x3b6d4e890d  (unknown)

We installed impala though Cloudera Manager by using parcels. We have 6 DN nodes, but only one node is crashing (A basic select query such as "select * from table_name limit 1;" worked fine. ) from the impala log file:

Instance 2b4ab7cc9aecb8ba:89587d4cf3e489b8 (host=rtldn3.hadoop.com:22000):
    Instance 2b4ab7cc9aecb8ba:89587d4cf3e489b9 (host=rtldn5.hadoop.com:22000):
    Instance 2b4ab7cc9aecb8ba:89587d4cf3e489bb (host=rtldn6.hadoop.com:22000):
    Instance 2b4ab7cc9aecb8ba:89587d4cf3e489bc (host=rtldn4.hadoop.com:22000):
    Instance 2b4ab7cc9aecb8ba:89587d4cf3e489bd (host=rtldn2.hadoop.com:22000):

I0930 16:24:08.230409  3296 coordinator.cc:486] Query id=2b4ab7cc9aecb8ba:89587d4cf3e489b6 failed because fragment id=2b4ab7cc9aecb8ba:89587d4cf3e489bb failed.
I0930 16:24:08.230466  9527 coordinator.cc:597] All backends finished or error.
I0930 16:24:08.230736  9527 impala-server.cc:951] UnregisterQuery(): query_id=2b4ab7cc9aecb8ba:89587d4cf3e489b6
I0930 16:24:08.230880  9527 impala-server.cc:1033] Cancel(): query_id=2b4ab7cc9aecb8ba:89587d4cf3e489b6
I0930 16:24:08.233968  9527 data-stream-mgr.cc:274] DeregisterRecvr(): fragment_instance_id=2b4ab7cc9aecb8ba:89587d4cf3e489b7, node=1
I0930 16:24:08.235227  9527 impala-server.cc:1033] Cancel(): query_id=2b4ab7cc9aecb8ba:89587d4cf3e489b6

After some digging, we found the "HBase Region Health Canary Report" also gave the same error "http://paste.ubuntu.com/6179569/", which made me thinking it is possible a Hbase issue.

I logged into the problematic node and took a look at the Hbase log file and found the following error:

WARN org.apache.hadoop.ipc.HBaseServer: Unable to read call parameters for client 10.6.70.3

In our cluster 10.6.70.3 is the Hbase master, this tells me the region server is not talking to the master node. I verified other region servers, they didn't have this issue.

Fix: simply restart the region server service on the problematic. Restart should solve the problem. For some reason, this specific DN didn't refresh the hbase jar file it is been using (org.apache.hadoop.hbase.util is in the latest hbase jar file).

It is also a good practice to check each node and make sure they use the same hbase jar file, you can also do a "hbase hbck" (HBase fsck) command as well.

No comments: