Ever run into Out of memory error with your MR tasks? This could happen sometimes if you are using large lookups (DistributedCache) or data structures holding huge amount of Entries, and the heap allocated to MapReduce tasks is not sufficient to handle this.
Stepping back, each map and reduce task is launched in separate JVM’. The heap allocated here is based on the configuration parameter “mapred.child.java.opts“. If this is too small to handle your data, the hadoop job will complain it does not have sufficient heap to continue. Try increasing the heap size through Configuration, but make sure you keep in mind the amount of RAM/number of cores/number of MR tasks configured on the nodes while adjusting the parameter.
Here I have increased the heap size to 2 GB. Of course I had huge RAM on the nodes in the cluster to be able to do so. Adding this as a resource to Configuration is one way, another is to simply inject it into Configuration invoking the set method.
public void set(String name, String value)
- Set the
value of the
name – property name.
value – property value.