log4j vulnerability – quick notes


This issue may lead to remote code execution (RCE) via use of JNDI.

  • Vendor: Apache Software Foundation
    • Product: Apache Log4j
      • <=2.14.1: affects 2.14.1 and prior versions

Fix: log4j 2.15.0

To list all JAR files on your system containing the vulnerable class file (including fat JARs), you can use:

for f in $(find . -name '*.jar' 2>/dev/null); do echo "Checking $f…"; unzip -l "$f" | grep -F org/apache/logging/log4j/core/lookup/JndiLookup.class; done

Additional details here:

https://www.cve.org/CVERecord?id=CVE-2021-44228

Raspberry Pi + Java R2D2 Robot – Part 1


This project was born as a Coronavirus project for my kids (you know how difficult could be to have your kids all day at home… oh school: please come back, we forgive you!)

On this post we cover the initial steps, including the full platform/chassis running, commanded by a Raspberry Pi and some Java code.

On this video we can see the chassis running:

 

The body of the R2D2 will be a 12″ cardboard tube from Home Depot. The chassis aluminum flat bars are also from HD:

2020-04-18 10.16.05

 

R2D2 head will be molded from a balloon, newspaper paper and paste (flour + water):

2020-04-18 11.46.58

 

Chassis design:

2020-04-18 11.47.28

 

Cutting, drilling and assembling:

2020-04-18 12.28.45

2020-04-18 12.34.452020-04-18 12.53.332020-04-18 12.57.15

 

 

 

 

 

 

 

 

 

 

 

This is how the chassis will fit into the tube:

2020-04-18 12.46.53

 

Soldering:

2020-04-18 13.09.55-4

 

Wheels in place:

2020-04-18 14.02.44

 

Preliminary setup and code testing:

2020-04-18 14.13.242020-04-19 11.05.302020-04-19 12.09.46-22020-04-19 12.09.46

 

 

 

 

 

 

 

 

 

 

Final assembly of the chassis:

2020-04-19 12.42.58

 

We are using two separate battery packs: one for the raspberry and a second one (more amps and 12V) for the motors and the L298n motor driver:2020-04-19 13.02.19

 

To be continued…..

2020-04-19 18.24.02

Proyecto Raspberry Pi + Java R2D2 Robot – Parte 1


Este proyecto nacio a causa del Coronavirus como una manera de mantener entretenidos a mis hijos (me imagino que ya saben que complicado puede ser tener a los chicos todo el dia en casa… Escuela volve!!)

En este post les muestro los pasos iniciales, incluyendo la construccion de la plataforma/chasis, y su funcionamiento comandado por un Raspberry Pi y algo de codigo en Java

En este video, vemos como funciona hasta ahora con el chassis ya ensamblado:

 

El “cuerp” del R2D2 es un tubo de carton de 12″ de los que se usan en construccion. El chasis esta construido con barras planas de aluminio:

2020-04-18 10.16.05

 

La cabeza del R2D2 se moldea con un globo, papel de diario y engrudo (agua y harina):

2020-04-18 11.46.58

 

Diseño del chasis:

2020-04-18 11.47.28

 

Cortando, taladrando y ensamblando:

2020-04-18 12.28.45

2020-04-18 12.34.452020-04-18 12.53.332020-04-18 12.57.15

 

 

 

 

 

 

 

 

 

 

 

Asi quedara el chasis en el tubo:

2020-04-18 12.46.53

 

Soldando:

2020-04-18 13.09.55-4

 

Ruedas en su lugar:

2020-04-18 14.02.44

 

Armado preliminar y prueba de codigo:

2020-04-18 14.13.242020-04-19 11.05.302020-04-19 12.09.46-22020-04-19 12.09.46

 

 

 

 

 

 

 

 

 

 

Ensambrado final del chasis:

2020-04-19 12.42.58

 

Usamos dos paquetes de baterias recargables: una para el Raspberry y otra (con mas amperaje y de 12V) para los motores y el driver L298N:

2020-04-19 13.02.19

 

Continuara…..

2020-04-19 18.24.02

Count and Say


/*Lets write an algorithm that, given an initial value, will produce the next 'count and say' sequence:
1, 11, 21, 1211, 111221, 312211, 13112221, 1113213211, …
*/

public class CountAndSay{

  public static void main(String[] args){

     System.out.println("Count and Say algorithm");

     System.out.println("next sequence is: " + countAndSay(args[0].toString()));

  }

  public static String countAndSay(String input){

     Boolean change=false;
     String output="";
     int count=1;

     char[] str = input.toCharArray();

     System.out.println("Input is: " + input);

     for (int i=0;i<str.length;i++){

       if (i+1<str.length && str[i]==str[i+1]){
          count++;
       }
       else{
          output = output + count + str[i];
          count=1;
       }

     }

     return output;

  }

}

 

 

HBase and Zookeeper debugging


I came across some scenarios where an application (i.e. Mapreduce) communicating to HBase through YARN could silently fail with a timeout like the following:

2017-01-30 19:42:03,657 DEBUG [main] org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation: locateRegionInMeta parentTable=hbase:meta, metaLocation=, attempt=9 of 35 failed; retrying after sleep of 10095 because: Failed after attempts=36, exceptions:
Mon Jan 30 19:42:03 UTC 2017, null, java.net.SocketTimeoutException: callTimeout=60000, callDuration=68463: row 'test2,#cmrNo acctNo,99999999999999' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=ip-172-31-3-246.us-west-2.compute.internal,16000,1485539268192, seqNum=0

The root cause for this behavior here wasn’t related to any missconfiguration at server/networking side, but a library missing in the class path.

When there is a zookeeper issue, depending on the retry parameters the exceptions are not visible.

On this case, In the Mapreduce Java application I’ve added/modified the following parameters that lead into more visibility in the communication layer between Zookeeper and HBase:

conf.set("hbase.client.retries.number", Integer.toString(1));
conf.set("zookeeper.session.timeout", Integer.toString(60000));
conf.set("zookeeper.recovery.retry", Integer.toString(1));


After this, the following exception was visible:

Exception: java.lang.NoClassDefFoundError: com/yammer/metrics/core/Gauge
 at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:157)
 at org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture.run(ResultBoundedCompletionService.java:65)
 at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: com.google.protobuf.ServiceException: java.lang.NoClassDefFoundError: com/yammer/metrics/core/Gauge

 

Playing around these parameters will cause the application exit quickly when there is a problem with the cluster. This can be desirable in a production environment.

Reducing the parameters to a more conservative value could yield better recovery times.  Setting zookeeper.recovery.retry to 0 will still result in up to two connection attempts made to all zk servers in the quorum and cause and application failure to happen in under a minute should there be a loss of zookeeper connectivity during execution.

 

As an additional note, if you are receiving timeouts because the application is trying to contact localhost instead of the quorum server, you can set the explicit parameters:

// HBase through MR on Yarn is trying to connect to localhost instead of quorum.
        conf.set("hbase.zookeeper.quorum","172.31.3.246");
        conf.set("hbase.zookeeper.property.clientPort","2181");

 

I’ve added a couple of examples of Mapreduce applications for HBase here: https://github.com/hvivani/bigdata/tree/master/hbase

 

Some additional notes on this behavior: https://discuss.pivotal.io/hc/en-us/articles/200933006-Hbase-application-hangs-indefinitely-connecting-to-zookeeper

 

Checking Yarn child execution environment


Never go out without this:

$ sudo -u yarn jps
27343 YarnChild
4156 NodeManager
27292 Jps

$ sudo strings -f /proc/27343/environ
/proc/27343/environ: STDERR_LOGFILE_ENV=/var/log/hadoop-yarn/containers/application_1485807340469_0019/container_1485807340469_0019_01_000003/stderr
/proc/27343/environ: SHELL=/bin/bash
/proc/27343/environ: TERM=linux
/proc/27343/environ: HADOOP_HOME=/usr/lib/hadoop
/proc/27343/environ: YARN_PID_DIR=/var/run/hadoop-yarn
/proc/27343/environ: NM_HOST=ip-172-31-5-156.us-west-2.compute.internal
/proc/27343/environ: HADOOP_PREFIX=/usr/lib/hadoop
/proc/27343/environ: YARN_OPTS= -XX:OnOutOfMemoryError='kill -9 %p' -XX:OnOutOfMemoryError='kill -9 %p' -server  -Dhadoop.log.dir=/var/log/hadoop-yarn -Dyarn.log.dir=/var/log/hadoop-yarn -Dhadoop.log.file=yarn-yarn-nodemanager-ip-172-31-5-156.log -Dyarn.log.file=yarn-yarn-nodemanager-ip-172-31-5-156.log -Dyarn.home.dir=/usr/lib/hadoop-yarn -Dhadoop.home.dir=/usr/lib/hadoop -Dhadoop.root.logger=INFO,DRFA -Dyarn.root.logger=INFO,DRFA -Dsun.net.inetaddr.ttl=30 -Djava.library.path=:/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native
/proc/27343/environ: NM_AUX_SERVICE_mapreduce_shuffle=AAA0+gAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=
/proc/27343/environ: YARN_NICENESS=0
/proc/27343/environ: NM_HTTP_PORT=8042
/proc/27343/environ: LOCAL_DIRS=/mnt/yarn/usercache/hadoop/appcache/application_1485807340469_0019,/mnt1/yarn/usercache/hadoop/appcache/application_1485807340469_0019
/proc/27343/environ: USER=hadoop
/proc/27343/environ: JAVA_LIBRARY_PATH=:/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native
/proc/27343/environ: LD_LIBRARY_PATH=/mnt/yarn/usercache/hadoop/appcache/application_1485807340469_0019/container_1485807340469_0019_01_000003:/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native
/proc/27343/environ: JSVC_HOME=/usr/lib/bigtop-utils
/proc/27343/environ: HADOOP_LIBEXEC_DIR=/usr/lib/hadoop/libexec
/proc/27343/environ: HADOOP_TOKEN_FILE_LOCATION=/mnt/yarn/usercache/hadoop/appcache/application_1485807340469_0019/container_1485807340469_0019_01_000003/container_tokens
/proc/27343/environ: SVC_USER=yarn
/proc/27343/environ: LOG_DIRS=/var/log/hadoop-yarn/containers/application_1485807340469_0019/container_1485807340469_0019_01_000003
/proc/27343/environ: MALLOC_ARENA_MAX=4
/proc/27343/environ: HADOOP_JOB_HISTORYSERVER_HEAPSIZE=2396
/proc/27343/environ: YARN_ROOT_LOGGER=INFO,DRFA
/proc/27343/environ: NLSPATH=/usr/dt/lib/nls/msg/%L/%N.cat
/proc/27343/environ: PATH=/usr/local/sbin:/usr/local/bin:/usr/bin:/usr/sbin:/sbin:/bin
/proc/27343/environ: CONF_DIR=/etc/hadoop/conf
/proc/27343/environ: YARN_IDENT_STRING=yarn
/proc/27343/environ: HADOOP_HDFS_HOME=/usr/lib/hadoop-hdfs
/proc/27343/environ: DAEMON_FLAGS=nodemanager
/proc/27343/environ: HADOOP_CLIENT_OPTS=
/proc/27343/environ: PWD=/mnt/yarn/usercache/hadoop/appcache/application_1485807340469_0019/container_1485807340469_0019_01_000003
/proc/27343/environ: HADOOP_COMMON_HOME=/usr/lib/hadoop
/proc/27343/environ: HADOOP_YARN_HOME=/usr/lib/hadoop-yarn
/proc/27343/environ: JAVA_HOME=/usr/lib/jvm/java-openjdk
/proc/27343/environ: HADOOP_CLASSPATH=/mnt/yarn/usercache/hadoop/appcache/application_1485807340469_0019/container_1485807340469_0019_01_000003:job.jar/job.jar:job.jar/classes/:job.jar/lib/*:/mnt/yarn/usercache/hadoop/appcache/application_1485807340469_0019/container_1485807340469_0019_01_000003/*:/mnt/yarn/usercache/hadoop/appcache/application_1485807340469_0019/container_1485807340469_0019_01_000001:job.jar/job.jar:job.jar/classes/:job.jar/lib/*:/mnt/yarn/usercache/hadoop/appcache/application_1485807340469_0019/container_1485807340469_0019_01_000001/*:/usr/lib/hbase/*:/usr/lib/hbase/lib/*:/etc/tez/conf:/usr/lib/tez/*:/usr/lib/tez/lib/*:/usr/lib/hadoop-lzo/lib/*:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/ddb/lib/emr-ddb-hadoop.jar:/usr/share/aws/emr/goodies/lib/emr-hadoop-goodies.jar:/usr/share/aws/emr/kinesis/lib/emr-kinesis-hadoop.jar:/usr/share/aws/emr/cloudwatch-sink/lib/*:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*
/proc/27343/environ: HADOOP_CONF_DIR=/etc/hadoop/conf
/proc/27343/environ: DAEMON=hadoop-yarn-nodemanager
/proc/27343/environ: STDOUT_LOGFILE_ENV=/var/log/hadoop-yarn/containers/application_1485807340469_0019/container_1485807340469_0019_01_000003/stdout
/proc/27343/environ: LANG=en_US.UTF-8
/proc/27343/environ: SLEEP_TIME=10
/proc/27343/environ: XFILESEARCHPATH=/usr/dt/app-defaults/%L/Dt
/proc/27343/environ: HADOOP_OPTS= -server -XX:OnOutOfMemoryError='kill -9 %p' -Dhadoop.log.dir=/usr/lib/hadoop/logs -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/usr/lib/hadoop -Dhadoop.id.str= -Dhadoop.root.logger=INFO,console -Djava.library.path=:/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -server -XX:OnOutOfMemoryError='kill -9 %p' -Dhadoop.log.dir=/usr/lib/hadoop/logs -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/usr/lib/hadoop -Dhadoop.id.str= -Dhadoop.root.logger=INFO,console -Djava.library.path=:/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true
/proc/27343/environ: PIDFILE=/var/run/hadoop-yarn/yarn-yarn-nodemanager.pid
/proc/27343/environ: YARN_LOG_DIR=/var/log/hadoop-yarn
/proc/27343/environ: DESC=Hadoop nodemanager
/proc/27343/environ: EXEC_PATH=/usr/lib/hadoop-yarn/sbin/yarn-daemon.sh
/proc/27343/environ: SHLVL=5
/proc/27343/environ: HOME=/home/
/proc/27343/environ: JVM_PID=27333
/proc/27343/environ: YARN_CONF_DIR=/etc/hadoop/conf
/proc/27343/environ: YARN_LOGFILE=yarn-yarn-nodemanager-ip-172-31-5-156.log
/proc/27343/environ: YARN_NODEMANAGER_HEAPSIZE=2048
/proc/27343/environ: UPSTART_INSTANCE=
/proc/27343/environ: HADOOP_MAPRED_HOME=/usr/lib/hadoop-mapreduce
/proc/27343/environ: LOGNAME=hadoop
/proc/27343/environ: NM_PORT=8041
/proc/27343/environ: HADOOP_HOME_WARN_SUPPRESS=true
/proc/27343/environ: CLASSPATH=/mnt/yarn/usercache/hadoop/appcache/application_1485807340469_0019/container_1485807340469_0019_01_000003:/etc/hadoop/conf:/usr/lib/hadoop/*:/usr/lib/hadoop/lib/*:/usr/lib/hadoop-hdfs/*:/usr/lib/hadoop-hdfs/lib/*:/usr/lib/hadoop-mapreduce/*:/usr/lib/hadoop-mapreduce/lib/*:/usr/lib/hadoop-yarn/*:/usr/lib/hadoop-yarn/lib/*:/usr/lib/hadoop-lzo/lib/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/lib/*:/usr/share/aws/emr/ddb/lib/emr-ddb-hadoop.jar:/usr/share/aws/emr/goodies/lib/emr-hadoop-goodies.jar:/usr/share/aws/emr/kinesis/lib/emr-kinesis-hadoop.jar:/usr/share/aws/emr/cloudwatch-sink/lib/*:/usr/lib/hadoop-mapreduce/share/hadoop/mapreduce/*:/usr/lib/hadoop-mapreduce/share/hadoop/mapreduce/lib/*:/usr/lib/hadoop-lzo/lib/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/lib/*:/usr/share/aws/emr/ddb/lib/emr-ddb-hadoop.jar:/usr/share/aws/emr/goodies/lib/emr-hadoop-goodies.jar:/usr/share/aws/emr/kinesis/lib/emr-kinesis-hadoop.jar:/usr/share/aws/emr/cloudwatch-sink/lib/*:job.jar/job.jar:job.jar/classes/:job.jar/lib/*:/mnt/yarn/usercache/hadoop/appcache/application_1485807340469_0019/container_1485807340469_0019_01_000003/*
/proc/27343/environ: CONTAINER_ID=container_1485807340469_0019_01_000003
/proc/27343/environ: YARN_PROXYSERVER_HEAPSIZE=2396
/proc/27343/environ: HADOOP_ROOT_LOGGER=DEBUG,console
/proc/27343/environ: WORKING_DIR=/var/lib/hadoop-yarn
/proc/27343/environ: UPSTART_JOB=hadoop-yarn-nodemanager
/proc/27343/environ: HADOOP_NAMENODE_HEAPSIZE=1740
/proc/27343/environ: HADOOP_DATANODE_HEAPSIZE=757
/proc/27343/environ: YARN_RESOURCEMANAGER_HEAPSIZE=2396
/proc/27343/environ: BASH_FUNC_run_prestart()=() {  su -s /bin/bash $SVC_USER -c "cd $WORKING_DIR && $EXEC_PATH --config '$CONF_DIR' start $DAEMON_FLAGS"
/proc/27343/environ: _=/usr/lib/jvm/java-openjdk/bin/java

Debugging Java Threads


Which Java process is using most of the CPU:

 $ ps u -C java

Generate the Java thread dump:

$ jstack -l PId > PId-threads.txt

From the Java threads I can count:

$ awk '/State: / { print }' < PId-threads.txt  | sort | uniq -c
 450    java.lang.Thread.State: BLOCKED (on object monitor)
 240    java.lang.Thread.State: RUNNABLE
  47    java.lang.Thread.State: TIMED_WAITING (on object monitor)
 294    java.lang.Thread.State: TIMED_WAITING (parking)
  31    java.lang.Thread.State: TIMED_WAITING (sleeping)
  42    java.lang.Thread.State: WAITING (on object monitor)
  62    java.lang.Thread.State: WAITING (parking)

From this, we search on the ones that are “waiting to lock”:

$ awk '/waiting to lock / { print }' < PId-threads.txt  | sort | uniq -c
   1     - waiting to lock <0x0000000600a027d8> (a org.apache.log4j.spi.RootLogger)
 294     - waiting to lock <0x0000000600f2e770> (a java.util.Hashtable)
  19     - waiting to lock <0x0000000600f36fc8> (a java.lang.Object)
   1     - waiting to lock <0x000000072f6e6708> (a org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)

Main thread blocking here is: <0x0000000600f2e770>.

From the BLOCKED threads, we have many processes waiting for getConnection:

"Thread-132985" prio=10 tid=0x00007fec40784800 nid=0x662d waiting for monitor entry [0x00007fec18cd4000]
   java.lang.Thread.State: BLOCKED (on object monitor)
    at org.apache.hadoop.ipc.Client.getConnection(Client.java:1449)
    - waiting to lock <0x0000000600f2e770> (a java.util.Hashtable)

And many others waiting for Connection.close:

"IPC Client (738091550) connection to /10.66.2.38:9022 from hadoop" daemon prio=10 tid=0x00007fec41c1f800 nid=0x2dcc waiting for monitor entry [0x00007fec15da6000]
   java.lang.Thread.State: BLOCKED (on object monitor)
    at org.apache.hadoop.ipc.Client$Connection.close(Client.java:1135)
    - waiting to lock <0x0000000600f2e770> (a java.util.Hashtable)

All of them are BLOCKED by <0x0000000600f2e770>.

Opening other PId’s we can find the Java thread that has the lock on this resource. The culprit will look like:

"Thread-133346" prio=10 tid=0x00007fec40ac7800 nid=0x747e runnable [0x00007fec17cc4000]
   java.lang.Thread.State: RUNNABLE
    at java.lang.Thread.<init>(Thread.java:234)
    at org.apache.hadoop.ipc.Client$Connection.<init>(Client.java:396)
    at org.apache.hadoop.ipc.Client.getConnection(Client.java:1452)
    - locked <0x0000000600f2e770> (a java.util.Hashtable)
    at org.apache.hadoop.ipc.Client.call(Client.java:1381)
    at org.apache.hadoop.ipc.Client.call(Client.java:1363)
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:215)
    at com.sun.proxy.$Proxy42.getApplicationReport(Unknown Source)
    at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplicationReport(ApplicationClientProtocolPBClientImpl.java:163)
    at sun.reflect.GeneratedMethodAccessor39.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103)
    at com.sun.proxy.$Proxy43.getApplicationReport(Unknown Source)
    at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getApplicationReport(YarnClientImpl.java:291)
    at org.apache.hadoop.mapred.ResourceMgrDelegate.getApplicationReport(ResourceMgrDelegate.java:294)
    at org.apache.hadoop.mapred.ClientServiceDelegate.getProxy(ClientServiceDelegate.java:152)
    at org.apache.hadoop.mapred.ClientServiceDelegate.invoke(ClientServiceDelegate.java:319)
    - locked <0x0000000733033970> (a org.apache.hadoop.mapred.ClientServiceDelegate)
    at org.apache.hadoop.mapred.ClientServiceDelegate.getJobStatus(ClientServiceDelegate.java:419)
    at org.apache.hadoop.mapred.YARNRunner.getJobStatus(YARNRunner.java:532)
    at org.apache.hadoop.mapreduce.Job$1.run(Job.java:314)
    at org.apache.hadoop.mapreduce.Job$1.run(Job.java:311)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
    at org.apache.hadoop.mapreduce.Job.updateStatus(Job.java:311)
    - locked <0x0000000733032e20> (a org.apache.hadoop.mapreduce.Job)
    at org.apache.hadoop.mapreduce.Job.getJobState(Job.java:347)
    at org.apache.hadoop.mapred.JobClient$NetworkedJob.getJobState(JobClient.java:295)
    - locked <0x0000000733032e10> (a org.apache.hadoop.mapred.JobClient$NetworkedJob)
    at org.apache.hadoop.hive.ql.exec.mr.HadoopJobExecHelper.progress(HadoopJobExecHelper.java:244)
    at org.apache.hadoop.hive.ql.exec.mr.HadoopJobExecHelper.progress(HadoopJobExecHelper.java:547)
    at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:426)
    at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:136)
    at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153)
    at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85)
    at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:72)

This kind of Blocked Status is not technically a Deadlock, but just a thread blocking other threads: locked status for a resource while many other objects start queuing waiting for the same resource.

Java change default version / cambiar la version Java por defecto


If we have more than one Java version installed on your Linux server (Redhat flavor) you can change defaults using ‘alternatives’ command:

[hadoop@ip-172-31-36-252 ~]$ sudo /usr/sbin/alternatives --config java

There are 2 programs which provide 'java'.

  Selection    Command
-----------------------------------------------
*+ 1           /usr/lib/jvm/jre-1.7.0-openjdk.x86_64/bin/java
   2           /usr/lib/jvm/jre-1.8.0-openjdk.x86_64/bin/java

Enter to keep the current selection[+], or type selection number: 2
[hadoop@ip-172-31-36-252 ~]$ sudo /usr/sbin/alternatives --config java

There are 2 programs which provide 'java'.

  Selection    Command
-----------------------------------------------
*  1           /usr/lib/jvm/jre-1.7.0-openjdk.x86_64/bin/java
 + 2           /usr/lib/jvm/jre-1.8.0-openjdk.x86_64/bin/java

Enter to keep the current selection[+], or type selection number:
[hadoop@ip-172-31-36-252 ~]$

 

FileInputFormat vs. CombineFileInputFormat


When you put a file into HDFS, it is converted to blocks of 128 MB. (Default value for HDFS on EMR) Consider a file big enough to consume 10 blocks. When you read that file from HDFS as an input for a MapReduce job, the same blocks are usually mapped, one by one, to splits.In this case, the file is divided into 10 splits (which implies means 10 map tasks) for processing. By default, the block size and the split size are equal, but the sizes are dependent on the configuration settings for the InputSplit class.

From a Java programming perspective, the class that holds the responsibility of this conversion is called an InputFormat, which is the main entry point into reading data from HDFS. From the blocks of the files, it creates a list of InputSplits. For each split, one mapper is created. Then each InputSplit is divided into records by using the RecordReader class. Each record represents a key-value pair.

FileInputFormat vs. CombineFileInputFormat

Before a MapReduce job is run, you can specify the InputFormat class to be used. The implementation of FileInputFormat requires you to create an instance of the RecordReader, and as mentioned previously, the RecordReader creates the key-value pairs for the mappers.

FileInputFormat is an abstract class that is the basis for a majority of the implementations of InputFormat. It contains the location of the input files and an implementation of how splits must be produced from these files. How the splits are converted into key-value pairs is defined in the subclasses. Some example of its subclasses are TextInputFormat, KeyValueTextInputFormat, and CombineFileInputFormat.

Hadoop works more efficiently with large files (files that occupy more than 1 block). FileInputFormat converts each large file into splits, and each split is created in a way that contains part of a single file. As mentioned, one mapper is generated for each split.

FileInputFormatLargeFile

However, when the input files are smaller than the default block size, many splits (and therefore, many mappers) are created. This arrangement makes the job inefficient. This Figure shows how too many mappers are created when FileInputFormat is used for many small files.

FileInputFormatManySmallFiles

To avoid this situation, CombineFileInputFormat is introduced. This InputFormat works well with small files, because it packs many of them into one split so there are fewer mappers, and each mapper has more data to process. This Figure shows how CombineFileInputFormat treats the small files so that fewer mappers are created.

CombineFileInputFormatSmallFiles

Consider boosting spark.yarn.executor.memoryOverhead


This is a very specific error related to the Spark Executor and the YARN container coexistence.

You will typically see errors like this one on the application container logs:

15/03/12 18:53:46 WARN YarnAllocator: Container killed by YARN for exceeding memory limits. 9.3 GB of 9.3 GB physical memory used. Consider boosting spark.yarn.executor.memoryOverhead.
15/03/12 18:53:46 ERROR YarnClusterScheduler: Lost executor 21 on ip-xxx-xx-xx-xx: Container killed by YARN for exceeding memory limits. 9.3 GB of 9.3 GB physical memory used. Consider boosting spark.yarn.executor.memoryOverhead.

To overcome this, you need to keep in mind how Yarn container and the executor are set in memory:

spark-tuning-yarn-memory

Memory used by Spark Executor is exceeding the predefined limits (often caused by a few spikes) and that is causing YARN to kill the container with the previously mentioned message error.

By default ‘spark.yarn.executor.memoryOverhead’ parameter is set to 384 MB. This value could be low depending on your application and the data load.

Suggested value for this parameter is ‘executorMemory * 0.10’.

We can increase the value for ‘spark.yarn.executor.memoryOverhead’ to 1GB on spark-submit bu adding this to the command line:

–conf spark.yarn.executor.memoryOverhead=1024

For reference, this fix was added on Jira 1930:

+  <td><code>spark.yarn.executor.memoryOverhead</code></td>