Monday, August 12, 2013

Hadoop - java.lang.OutOfMemoryError: Java heap space

When you run a Hadoop MapReduce job, if you see this error in your log file:

2013-08-12 10:24:45,506 FATAL org.apache.hadoop.mapred.Child: Error running child : java.lang.OutOfMemoryError: Java heap space
 at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(
 at org.apache.hadoop.mapred.MapTask.createSortingCollector(
 at org.apache.hadoop.mapred.MapTask.access$100(
 at org.apache.hadoop.mapred.MapTask$NewOutputCollector.(
 at org.apache.hadoop.mapred.MapTask.runNewMapper(
 at org.apache.hadoop.mapred.Child$
 at Method)
 at org.apache.hadoop.mapred.Child.main(

Clearly you have run out of the heap size allotted to Java. Two ways you can solve this:

1. Execute the following before executing hadoop command:
export HADOOP_OPTS="-Xmx4096m"
2. Adding the following permanent setting in your mapred-site.xml file, this file lies in HADOOP_HOME/conf/:

    <value> -Xmx524288000</value>

Value is byte.

Method #1 is temporary, method #2 is permanent. If you are using Cloudera Manager, it is in "mapreduce" service, "configuration", "Gateway (Default) / Resource Management", "MapReduce Child Java Maximum Heap Size". Remember to do a "Deploy Client Configuration" after.

No comments: