Error sending messages to firehose: mgmt1-SERVICEMONITOR-46ebf1bb9c51277b3bd7cc6398f28303 Traceback (most recent call last): File "/usr/lib64/cmf/agent/src/cmf/monitor/firehose.py", line 70, in _send self._port) File "/usr/lib64/cmf/agent/build/env/lib/python2.6/site-packages/avro-1.6.3-py2.6.egg/avro/ipc.py", line 471, in __init__ self.conn.connect() File "/usr/lib64/python2.6/httplib.py", line 720, in connect self.timeout) File "/usr/lib64/python2.6/socket.py", line 567, in create_connection raise error, msg error: [Errno 111] Connection refused
All datande have the same hardware configuration and same software packages.
In the datanode status page in CM, it says "The health of this role's host was concerning. The following health checks were concerning: agent status.". Like the picture below:
A soft restart of CM agent didn't help:
# /etc/init.d/cloudera-scm-agent restart Stopping cloudera-scm-agent: [ OK ] Starting cloudera-scm-agent: [ OK ]Soft restart only restarts scm agent process, not all the processes that managed by CM.
You need to do a hard restart, to restart the supervisord process to let the error go away, at least in my case.
# /etc/init.d/cloudera-scm-agent hard_restart Stopping cloudera-scm-agent: [ OK ] Stopping supervisord: [ OK ] Starting cloudera-scm-agent: [ OK ]
Since Cloudera Manager uses an open source supervisor called supervisord that takes care of redirecting log files, notifying of process failure, setting the effective user ID of the calling process to the right user, and so forth. "hard_restart" restarts agents, the supervisord process, and all processes managed by the supervisord process. Of course the datanode will become "Bad" for a short while, but it will be come "OK" in the next hearbeat.
No comments:
Post a Comment