Monday, August 12, 2013

Hadoop - java.lang.OutOfMemoryError: Java heap space

When you run a Hadoop MapReduce job, if you see this error in your log file:

2013-08-12 10:24:45,506 FATAL org.apache.hadoop.mapred.Child: Error running child : java.lang.OutOfMemoryError: Java heap space
 at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:826)
 at org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:376)
 at org.apache.hadoop.mapred.MapTask.access$100(MapTask.java:85)
 at org.apache.hadoop.mapred.MapTask$NewOutputCollector.(MapTask.java:584)
 at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:656)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)
 at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
 at org.apache.hadoop.mapred.Child.main(Child.java:262)

Clearly you have run out of the heap size allotted to Java. Two ways you can solve this:

1. Execute the following before executing hadoop command:
export HADOOP_OPTS="-Xmx4096m"
2. Adding the following permanent setting in your mapred-site.xml file, this file lies in HADOOP_HOME/conf/:

  <property>
    <name>mapred.child.java.opts</name>
    <value> -Xmx524288000</value>
  </property>


Value is byte.

Method #1 is temporary, method #2 is permanent. If you are using Cloudera Manager, it is in "mapreduce" service, "configuration", "Gateway (Default) / Resource Management", "MapReduce Child Java Maximum Heap Size". Remember to do a "Deploy Client Configuration" after.



No comments: