Debugging Java Threads

Which Java process is using most of the CPU:

 $ ps u -C java

Generate the Java thread dump:

$ jstack -l PId > PId-threads.txt

From the Java threads I can count:

$ awk '/State: / { print }' < PId-threads.txt  | sort | uniq -c
 450    java.lang.Thread.State: BLOCKED (on object monitor)
 240    java.lang.Thread.State: RUNNABLE
  47    java.lang.Thread.State: TIMED_WAITING (on object monitor)
 294    java.lang.Thread.State: TIMED_WAITING (parking)
  31    java.lang.Thread.State: TIMED_WAITING (sleeping)
  42    java.lang.Thread.State: WAITING (on object monitor)
  62    java.lang.Thread.State: WAITING (parking)

From this, we search on the ones that are “waiting to lock”:

$ awk '/waiting to lock / { print }' < PId-threads.txt  | sort | uniq -c
   1     - waiting to lock <0x0000000600a027d8> (a org.apache.log4j.spi.RootLogger)
 294     - waiting to lock <0x0000000600f2e770> (a java.util.Hashtable)
  19     - waiting to lock <0x0000000600f36fc8> (a java.lang.Object)
   1     - waiting to lock <0x000000072f6e6708> (a org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)

Main thread blocking here is: <0x0000000600f2e770>.

From the BLOCKED threads, we have many processes waiting for getConnection:

"Thread-132985" prio=10 tid=0x00007fec40784800 nid=0x662d waiting for monitor entry [0x00007fec18cd4000]
   java.lang.Thread.State: BLOCKED (on object monitor)
    at org.apache.hadoop.ipc.Client.getConnection(
    - waiting to lock <0x0000000600f2e770> (a java.util.Hashtable)

And many others waiting for Connection.close:

"IPC Client (738091550) connection to / from hadoop" daemon prio=10 tid=0x00007fec41c1f800 nid=0x2dcc waiting for monitor entry [0x00007fec15da6000]
   java.lang.Thread.State: BLOCKED (on object monitor)
    at org.apache.hadoop.ipc.Client$Connection.close(
    - waiting to lock <0x0000000600f2e770> (a java.util.Hashtable)

All of them are BLOCKED by <0x0000000600f2e770>.

Opening other PId’s we can find the Java thread that has the lock on this resource. The culprit will look like:

"Thread-133346" prio=10 tid=0x00007fec40ac7800 nid=0x747e runnable [0x00007fec17cc4000]
   java.lang.Thread.State: RUNNABLE
    at java.lang.Thread.<init>(
    at org.apache.hadoop.ipc.Client$Connection.<init>(
    at org.apache.hadoop.ipc.Client.getConnection(
    - locked <0x0000000600f2e770> (a java.util.Hashtable)
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(
    at com.sun.proxy.$Proxy42.getApplicationReport(Unknown Source)
    at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplicationReport(
    at sun.reflect.GeneratedMethodAccessor39.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(
    at java.lang.reflect.Method.invoke(
    at com.sun.proxy.$Proxy43.getApplicationReport(Unknown Source)
    at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getApplicationReport(
    at org.apache.hadoop.mapred.ResourceMgrDelegate.getApplicationReport(
    at org.apache.hadoop.mapred.ClientServiceDelegate.getProxy(
    at org.apache.hadoop.mapred.ClientServiceDelegate.invoke(
    - locked <0x0000000733033970> (a org.apache.hadoop.mapred.ClientServiceDelegate)
    at org.apache.hadoop.mapred.ClientServiceDelegate.getJobStatus(
    at org.apache.hadoop.mapred.YARNRunner.getJobStatus(
    at org.apache.hadoop.mapreduce.Job$
    at org.apache.hadoop.mapreduce.Job$
    at Method)
    at org.apache.hadoop.mapreduce.Job.updateStatus(
    - locked <0x0000000733032e20> (a org.apache.hadoop.mapreduce.Job)
    at org.apache.hadoop.mapreduce.Job.getJobState(
    at org.apache.hadoop.mapred.JobClient$NetworkedJob.getJobState(
    - locked <0x0000000733032e10> (a org.apache.hadoop.mapred.JobClient$NetworkedJob)
    at org.apache.hadoop.hive.ql.exec.Task.executeTask(
    at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(

This kind of Blocked Status is not technically a Deadlock, but just a thread blocking other threads: locked status for a resource while many other objects start queuing waiting for the same resource.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s