Checking Yarn child execution environment


Never go out without this:

$ sudo -u yarn jps
27343 YarnChild
4156 NodeManager
27292 Jps

$ sudo strings -f /proc/27343/environ
/proc/27343/environ: STDERR_LOGFILE_ENV=/var/log/hadoop-yarn/containers/application_1485807340469_0019/container_1485807340469_0019_01_000003/stderr
/proc/27343/environ: SHELL=/bin/bash
/proc/27343/environ: TERM=linux
/proc/27343/environ: HADOOP_HOME=/usr/lib/hadoop
/proc/27343/environ: YARN_PID_DIR=/var/run/hadoop-yarn
/proc/27343/environ: NM_HOST=ip-172-31-5-156.us-west-2.compute.internal
/proc/27343/environ: HADOOP_PREFIX=/usr/lib/hadoop
/proc/27343/environ: YARN_OPTS= -XX:OnOutOfMemoryError='kill -9 %p' -XX:OnOutOfMemoryError='kill -9 %p' -server  -Dhadoop.log.dir=/var/log/hadoop-yarn -Dyarn.log.dir=/var/log/hadoop-yarn -Dhadoop.log.file=yarn-yarn-nodemanager-ip-172-31-5-156.log -Dyarn.log.file=yarn-yarn-nodemanager-ip-172-31-5-156.log -Dyarn.home.dir=/usr/lib/hadoop-yarn -Dhadoop.home.dir=/usr/lib/hadoop -Dhadoop.root.logger=INFO,DRFA -Dyarn.root.logger=INFO,DRFA -Dsun.net.inetaddr.ttl=30 -Djava.library.path=:/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native
/proc/27343/environ: NM_AUX_SERVICE_mapreduce_shuffle=AAA0+gAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=
/proc/27343/environ: YARN_NICENESS=0
/proc/27343/environ: NM_HTTP_PORT=8042
/proc/27343/environ: LOCAL_DIRS=/mnt/yarn/usercache/hadoop/appcache/application_1485807340469_0019,/mnt1/yarn/usercache/hadoop/appcache/application_1485807340469_0019
/proc/27343/environ: USER=hadoop
/proc/27343/environ: JAVA_LIBRARY_PATH=:/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native
/proc/27343/environ: LD_LIBRARY_PATH=/mnt/yarn/usercache/hadoop/appcache/application_1485807340469_0019/container_1485807340469_0019_01_000003:/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native
/proc/27343/environ: JSVC_HOME=/usr/lib/bigtop-utils
/proc/27343/environ: HADOOP_LIBEXEC_DIR=/usr/lib/hadoop/libexec
/proc/27343/environ: HADOOP_TOKEN_FILE_LOCATION=/mnt/yarn/usercache/hadoop/appcache/application_1485807340469_0019/container_1485807340469_0019_01_000003/container_tokens
/proc/27343/environ: SVC_USER=yarn
/proc/27343/environ: LOG_DIRS=/var/log/hadoop-yarn/containers/application_1485807340469_0019/container_1485807340469_0019_01_000003
/proc/27343/environ: MALLOC_ARENA_MAX=4
/proc/27343/environ: HADOOP_JOB_HISTORYSERVER_HEAPSIZE=2396
/proc/27343/environ: YARN_ROOT_LOGGER=INFO,DRFA
/proc/27343/environ: NLSPATH=/usr/dt/lib/nls/msg/%L/%N.cat
/proc/27343/environ: PATH=/usr/local/sbin:/usr/local/bin:/usr/bin:/usr/sbin:/sbin:/bin
/proc/27343/environ: CONF_DIR=/etc/hadoop/conf
/proc/27343/environ: YARN_IDENT_STRING=yarn
/proc/27343/environ: HADOOP_HDFS_HOME=/usr/lib/hadoop-hdfs
/proc/27343/environ: DAEMON_FLAGS=nodemanager
/proc/27343/environ: HADOOP_CLIENT_OPTS=
/proc/27343/environ: PWD=/mnt/yarn/usercache/hadoop/appcache/application_1485807340469_0019/container_1485807340469_0019_01_000003
/proc/27343/environ: HADOOP_COMMON_HOME=/usr/lib/hadoop
/proc/27343/environ: HADOOP_YARN_HOME=/usr/lib/hadoop-yarn
/proc/27343/environ: JAVA_HOME=/usr/lib/jvm/java-openjdk
/proc/27343/environ: HADOOP_CLASSPATH=/mnt/yarn/usercache/hadoop/appcache/application_1485807340469_0019/container_1485807340469_0019_01_000003:job.jar/job.jar:job.jar/classes/:job.jar/lib/*:/mnt/yarn/usercache/hadoop/appcache/application_1485807340469_0019/container_1485807340469_0019_01_000003/*:/mnt/yarn/usercache/hadoop/appcache/application_1485807340469_0019/container_1485807340469_0019_01_000001:job.jar/job.jar:job.jar/classes/:job.jar/lib/*:/mnt/yarn/usercache/hadoop/appcache/application_1485807340469_0019/container_1485807340469_0019_01_000001/*:/usr/lib/hbase/*:/usr/lib/hbase/lib/*:/etc/tez/conf:/usr/lib/tez/*:/usr/lib/tez/lib/*:/usr/lib/hadoop-lzo/lib/*:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/ddb/lib/emr-ddb-hadoop.jar:/usr/share/aws/emr/goodies/lib/emr-hadoop-goodies.jar:/usr/share/aws/emr/kinesis/lib/emr-kinesis-hadoop.jar:/usr/share/aws/emr/cloudwatch-sink/lib/*:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*
/proc/27343/environ: HADOOP_CONF_DIR=/etc/hadoop/conf
/proc/27343/environ: DAEMON=hadoop-yarn-nodemanager
/proc/27343/environ: STDOUT_LOGFILE_ENV=/var/log/hadoop-yarn/containers/application_1485807340469_0019/container_1485807340469_0019_01_000003/stdout
/proc/27343/environ: LANG=en_US.UTF-8
/proc/27343/environ: SLEEP_TIME=10
/proc/27343/environ: XFILESEARCHPATH=/usr/dt/app-defaults/%L/Dt
/proc/27343/environ: HADOOP_OPTS= -server -XX:OnOutOfMemoryError='kill -9 %p' -Dhadoop.log.dir=/usr/lib/hadoop/logs -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/usr/lib/hadoop -Dhadoop.id.str= -Dhadoop.root.logger=INFO,console -Djava.library.path=:/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -server -XX:OnOutOfMemoryError='kill -9 %p' -Dhadoop.log.dir=/usr/lib/hadoop/logs -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/usr/lib/hadoop -Dhadoop.id.str= -Dhadoop.root.logger=INFO,console -Djava.library.path=:/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true
/proc/27343/environ: PIDFILE=/var/run/hadoop-yarn/yarn-yarn-nodemanager.pid
/proc/27343/environ: YARN_LOG_DIR=/var/log/hadoop-yarn
/proc/27343/environ: DESC=Hadoop nodemanager
/proc/27343/environ: EXEC_PATH=/usr/lib/hadoop-yarn/sbin/yarn-daemon.sh
/proc/27343/environ: SHLVL=5
/proc/27343/environ: HOME=/home/
/proc/27343/environ: JVM_PID=27333
/proc/27343/environ: YARN_CONF_DIR=/etc/hadoop/conf
/proc/27343/environ: YARN_LOGFILE=yarn-yarn-nodemanager-ip-172-31-5-156.log
/proc/27343/environ: YARN_NODEMANAGER_HEAPSIZE=2048
/proc/27343/environ: UPSTART_INSTANCE=
/proc/27343/environ: HADOOP_MAPRED_HOME=/usr/lib/hadoop-mapreduce
/proc/27343/environ: LOGNAME=hadoop
/proc/27343/environ: NM_PORT=8041
/proc/27343/environ: HADOOP_HOME_WARN_SUPPRESS=true
/proc/27343/environ: CLASSPATH=/mnt/yarn/usercache/hadoop/appcache/application_1485807340469_0019/container_1485807340469_0019_01_000003:/etc/hadoop/conf:/usr/lib/hadoop/*:/usr/lib/hadoop/lib/*:/usr/lib/hadoop-hdfs/*:/usr/lib/hadoop-hdfs/lib/*:/usr/lib/hadoop-mapreduce/*:/usr/lib/hadoop-mapreduce/lib/*:/usr/lib/hadoop-yarn/*:/usr/lib/hadoop-yarn/lib/*:/usr/lib/hadoop-lzo/lib/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/lib/*:/usr/share/aws/emr/ddb/lib/emr-ddb-hadoop.jar:/usr/share/aws/emr/goodies/lib/emr-hadoop-goodies.jar:/usr/share/aws/emr/kinesis/lib/emr-kinesis-hadoop.jar:/usr/share/aws/emr/cloudwatch-sink/lib/*:/usr/lib/hadoop-mapreduce/share/hadoop/mapreduce/*:/usr/lib/hadoop-mapreduce/share/hadoop/mapreduce/lib/*:/usr/lib/hadoop-lzo/lib/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/lib/*:/usr/share/aws/emr/ddb/lib/emr-ddb-hadoop.jar:/usr/share/aws/emr/goodies/lib/emr-hadoop-goodies.jar:/usr/share/aws/emr/kinesis/lib/emr-kinesis-hadoop.jar:/usr/share/aws/emr/cloudwatch-sink/lib/*:job.jar/job.jar:job.jar/classes/:job.jar/lib/*:/mnt/yarn/usercache/hadoop/appcache/application_1485807340469_0019/container_1485807340469_0019_01_000003/*
/proc/27343/environ: CONTAINER_ID=container_1485807340469_0019_01_000003
/proc/27343/environ: YARN_PROXYSERVER_HEAPSIZE=2396
/proc/27343/environ: HADOOP_ROOT_LOGGER=DEBUG,console
/proc/27343/environ: WORKING_DIR=/var/lib/hadoop-yarn
/proc/27343/environ: UPSTART_JOB=hadoop-yarn-nodemanager
/proc/27343/environ: HADOOP_NAMENODE_HEAPSIZE=1740
/proc/27343/environ: HADOOP_DATANODE_HEAPSIZE=757
/proc/27343/environ: YARN_RESOURCEMANAGER_HEAPSIZE=2396
/proc/27343/environ: BASH_FUNC_run_prestart()=() {  su -s /bin/bash $SVC_USER -c "cd $WORKING_DIR && $EXEC_PATH --config '$CONF_DIR' start $DAEMON_FLAGS"
/proc/27343/environ: _=/usr/lib/jvm/java-openjdk/bin/java

get the driver’s IP in spark yarn-cluster mode


In cluster mode, the Spark driver runs inside an application master process which is managed by YARN on the cluster, and the client can go away after initiating the application. In client mode, the driver runs in the client process, and the application master is only used for requesting resources from YARN.

Sometimes we will have a bunch of logs for a terminated cluster and we need to find out which node was the driver in cluster mode.

Searching for “driverUrl” on the application/container logs, we will find it

find . -iname "*.gz" | xargs zgrep "driverUrl"
./container_1459071485818_0006_02_000001/stderr.gz:15/03/28 05:10:47 INFO YarnAllocator: Launching ExecutorRunnable. driverUrl: spark://CoarseGrainedScheduler@172.31.16.15:47452,  executorHostname: ip-172-31-16-13.ec2.internal
...
./container_1459071485818_0006_02_000001/stderr.gz:15/03/28 05:10:47 INFO YarnAllocator: Launching ExecutorRunnable. driverUrl: spark://CoarseGrainedScheduler@172.31.16.15:47452,  executorHostname: ip-172-31-16-14.ec2.internal

On this case the driver was running on 172.31.16.15.

yarn: execute a script on all the nodes in the cluster


This is more Linux script related, but, sometimes we have a Hadoop (YARN) cluster running and we need to run a post install script or activity that executes on all the nodes in the cluster:

for i in `yarn node --list | cut -f 1 -d ':' | grep "ip"`; do ssh -i your-key.pem hadoop@$i 'hadoop fs -copyToLocal s3://mybucket/myscript.sh | chmod +x /home/hadoop/myscript.sh | /home/hadoop/myscript.sh' ; done

Note: we will need the your-key.pem file in the master node.

YARN / Map Reduce memory settings


On Hadoop 1, we used to use mapred.child.java.opts to set the Java Heap size for the task tracker child processes.

With YARN, that parameter has been deprecated in favor of:

  • mapreduce.map.java.opts – These parameter is passed to the JVM for mappers.
  • mapreduce.reduce.java.opts – These parameter is passed to the JVM for reducers.

The key thing to understand is that if both mapred.child.java.opts and mapreduce.{map|reduce}.java.opts are specified, the settings in mapred.child.java.opts will be ignored.

The way to set values to these variables is, as example:

mapreduce.map.java.opts = -Xmx1280m
mapreduce.reduce.java.opts= -Xmx2304m

A common error that we will see on our application if it try to run beyond those limits is, as example:

.YarnChild.main(YarnChild.java:162)\nCaused by: java.lang.OutOfMemoryError: Java heap space\

Now, assuming an error like mentioned is at reduce phase, we can increase the value for mapreduce.reduce.java.opts to:

mapreduce.reduce.java.opts = -Xmx3584m

But, we will face another error if we do not increase mapreduce.reduce.memory.mb variable accordingly:

mapreduce.reduce.memory.mb = 4096

A typical error message when Container memory available is lower than maximum Java Heap size is:

containerID=container_1420691746319_0001_01_000151] is running beyond physical memory limits. Current usage: 2.6 GB of 2.5 GB physical memory used; 4.8 GB of 12.5 GB virtual memory used. Killing container.

The general rule of thumb to use is that -Xmx (mapreduce.reduce.java.opts) should be 80% of the size of the equivalent mapreduce.reduce.memory.mb value to account for stack and PermGen space.

As a Summary:

Mapper and Reducer Settings:

  • mapreduce.map.memory.mb – This defines the overall size of the JVM that is used by your mappers.  This value is in megabytes (MB).
  • mapreduce.reduce.memory.mb – This defines the overall size of the JVM that is used by your reducers.  This value is in megabytes (MB).

Yarn-MAP-REDUCE-MB

Containers

As we should know by now, on YARN memory allocation…everything is a container. This includes the following components:

  • Application Master – It controls the execution of the application (such as re-running tasks when they fail).
  • Mappers – Process all of the input files and grab the information out of them necessary to perform the job.
  • Reducers – These containers take information from the mappers and distill it into the final result of the job (which goes into the output files.)

Global Settings

This two properties handles containers memory allocation:

  • yarn.scheduler.minimum.allocation.mb – This is the smallest allowed value for a container.  Any requests for a container that ask for a value smaller than this will be set to this value.  This value is in megabytes (MB).
  • yarn.scheduler.maximum.allocation.mb – This is the largest allowed value for a container.  Any requests for a container that ask for a value larger than this will be set to this value.  This value is in megabytes (MB)

 

yarn: change configuration and restart node manager on a live cluster


This procedure is to change Yarn configuration on a live cluster, propagate the changes to all the nodes and restart Yarn node manager.

Both commands are listing all the nodes on the cluster and then filtering the DNS name to execute a remote command via SSH. You can customize the sed filter depending on your own needs. This is filtering DNS names with Elastic Mapreduce format (ip-xx-xx-xx-xx.eu-west-1.compute.internal).

1. Upload the private key (.pem) file you are using to access the master node on the cluster. Change the private key permissions to at least 600 (i.e chmod 600 MyKeyName.pem)

2.  Change /conf/yarn-site.xml and use a command like this to populate the change across the cluster.

yarn node -list|sed -n "s/^\(ip[^:]*\):.*/\1/p" | xargs -t -I{} -P10 scp -o StrictHostKeyChecking=no -i ~/MyKeyName.pem ~/conf/yarn-site.xml hadoop@{}://home/hadoop/conf/

3. This command will restart Yarn Node Resource manager on all the nodes.

 yarn node -list|sed -n "s/^\(ip[^:]*\):.*/\1/p" | xargs -t -I{} -P10 ssh -o StrictHostKeyChecking=no -i ~/MyKeyName.pem hadoop@{} "yarn nodemanager stop"