On Hadoop 1, we used to use mapred.child.java.opts to set the Java Heap size for the task tracker child processes.
With YARN, that parameter has been deprecated in favor of:
- mapreduce.map.java.opts – These parameter is passed to the JVM for mappers.
- mapreduce.reduce.java.opts – These parameter is passed to the JVM for reducers.
The key thing to understand is that if both mapred.child.java.opts and mapreduce.{map|reduce}.java.opts are specified, the settings in mapred.child.java.opts will be ignored.
The way to set values to these variables is, as example:
mapreduce.map.java.opts = -Xmx1280m mapreduce.reduce.java.opts= -Xmx2304m
A common error that we will see on our application if it try to run beyond those limits is, as example:
.YarnChild.main(YarnChild.java:162)\nCaused by: java.lang.OutOfMemoryError: Java heap space\
Now, assuming an error like mentioned is at reduce phase, we can increase the value for mapreduce.reduce.java.opts to:
mapreduce.reduce.java.opts = -Xmx3584m
But, we will face another error if we do not increase mapreduce.reduce.memory.mb variable accordingly:
mapreduce.reduce.memory.mb = 4096
A typical error message when Container memory available is lower than maximum Java Heap size is:
containerID=container_1420691746319_0001_01_000151] is running beyond physical memory limits. Current usage: 2.6 GB of 2.5 GB physical memory used; 4.8 GB of 12.5 GB virtual memory used. Killing container.
The general rule of thumb to use is that -Xmx (mapreduce.reduce.java.opts) should be 80% of the size of the equivalent mapreduce.reduce.memory.mb value to account for stack and PermGen space.
As a Summary:
Mapper and Reducer Settings:
- mapreduce.map.memory.mb – This defines the overall size of the JVM that is used by your mappers. This value is in megabytes (MB).
- mapreduce.reduce.memory.mb – This defines the overall size of the JVM that is used by your reducers. This value is in megabytes (MB).
Containers
As we should know by now, on YARN memory allocation…everything is a container. This includes the following components:
- Application Master – It controls the execution of the application (such as re-running tasks when they fail).
- Mappers – Process all of the input files and grab the information out of them necessary to perform the job.
- Reducers – These containers take information from the mappers and distill it into the final result of the job (which goes into the output files.)
Global Settings
This two properties handles containers memory allocation:
- yarn.scheduler.minimum.allocation.mb – This is the smallest allowed value for a container. Any requests for a container that ask for a value smaller than this will be set to this value. This value is in megabytes (MB).
- yarn.scheduler.maximum.allocation.mb – This is the largest allowed value for a container. Any requests for a container that ask for a value larger than this will be set to this value. This value is in megabytes (MB)