Sometimes we might need to create thousands or millions of files at once.
This command will create the number specified in the range using touch:
touch bspl{00001..70000}.c
Sometimes we might need to create thousands or millions of files at once.
This command will create the number specified in the range using touch:
touch bspl{00001..70000}.c
Either you are trying to figure out a memory leak in an Atlassian application, a DB engine… or you just want to know why your java process is ‘hung’… or eating all CPU resources… you’ll find yourself performing these debugging steps… Here a few notes about performing Java thread dumps and checking what’s going on under the hood.
Which Java process is using most of the CPU:
$ ps u -C java
Generate the Java thread dump:
$ jstack -l PId > PId-threads.txt
From the Java threads I can count:
$ awk '/State: / { print }' < PId-threads.txt | sort | uniq -c
450 java.lang.Thread.State: BLOCKED (on object monitor)
240 java.lang.Thread.State: RUNNABLE
47 java.lang.Thread.State: TIMED_WAITING (on object monitor)
294 java.lang.Thread.State: TIMED_WAITING (parking)
31 java.lang.Thread.State: TIMED_WAITING (sleeping)
42 java.lang.Thread.State: WAITING (on object monitor)
62 java.lang.Thread.State: WAITING (parking)
From this, we search on the ones that are “waiting to lock”***:
$ awk '/waiting to lock / { print }' < PId-threads.txt | sort | uniq -c
1 - waiting to lock <0x0000000600a027d8> (a org.apache.log4j.spi.RootLogger)
294 - waiting to lock <0x0000000600f2e770> (a java.util.Hashtable)
19 - waiting to lock <0x0000000600f36fc8> (a java.lang.Object)
1 - waiting to lock <0x000000072f6e6708> (a org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
Main thread blocking here is: <0x0000000600f2e770>.
From the BLOCKED threads, we have many processes waiting for getConnection:
"Thread-132985" prio=10 tid=0x00007fec40784800 nid=0x662d waiting for monitor entry [0x00007fec18cd4000]
java.lang.Thread.State: BLOCKED (on object monitor)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1449)
- waiting to lock <0x0000000600f2e770> (a java.util.Hashtable)
And many others waiting for Connection.close:
"IPC Client (738091550) connection to /10.66.2.38:9022 from hadoop" daemon prio=10 tid=0x00007fec41c1f800 nid=0x2dcc waiting for monitor entry [0x00007fec15da6000]
java.lang.Thread.State: BLOCKED (on object monitor)
at org.apache.hadoop.ipc.Client$Connection.close(Client.java:1135)
- waiting to lock <0x0000000600f2e770> (a java.util.Hashtable)
All of them are BLOCKED by <0x0000000600f2e770>.
Opening other PId’s we can find the Java thread that has the lock on this resource. The culprit will look like:
"Thread-133346" prio=10 tid=0x00007fec40ac7800 nid=0x747e runnable [0x00007fec17cc4000]
java.lang.Thread.State: RUNNABLE
at java.lang.Thread.<init>(Thread.java:234)
at org.apache.hadoop.ipc.Client$Connection.<init>(Client.java:396)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1452)
- locked <0x0000000600f2e770> (a java.util.Hashtable)
at org.apache.hadoop.ipc.Client.call(Client.java:1381)
at org.apache.hadoop.ipc.Client.call(Client.java:1363)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:215)
at com.sun.proxy.$Proxy42.getApplicationReport(Unknown Source)
at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplicationReport(ApplicationClientProtocolPBClientImpl.java:163)
at sun.reflect.GeneratedMethodAccessor39.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103)
at com.sun.proxy.$Proxy43.getApplicationReport(Unknown Source)
at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getApplicationReport(YarnClientImpl.java:291)
at org.apache.hadoop.mapred.ResourceMgrDelegate.getApplicationReport(ResourceMgrDelegate.java:294)
at org.apache.hadoop.mapred.ClientServiceDelegate.getProxy(ClientServiceDelegate.java:152)
at org.apache.hadoop.mapred.ClientServiceDelegate.invoke(ClientServiceDelegate.java:319)
- locked <0x0000000733033970> (a org.apache.hadoop.mapred.ClientServiceDelegate)
at org.apache.hadoop.mapred.ClientServiceDelegate.getJobStatus(ClientServiceDelegate.java:419)
at org.apache.hadoop.mapred.YARNRunner.getJobStatus(YARNRunner.java:532)
at org.apache.hadoop.mapreduce.Job$1.run(Job.java:314)
at org.apache.hadoop.mapreduce.Job$1.run(Job.java:311)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.hadoop.mapreduce.Job.updateStatus(Job.java:311)
- locked <0x0000000733032e20> (a org.apache.hadoop.mapreduce.Job)
at org.apache.hadoop.mapreduce.Job.getJobState(Job.java:347)
at org.apache.hadoop.mapred.JobClient$NetworkedJob.getJobState(JobClient.java:295)
- locked <0x0000000733032e10> (a org.apache.hadoop.mapred.JobClient$NetworkedJob)
at org.apache.hadoop.hive.ql.exec.mr.HadoopJobExecHelper.progress(HadoopJobExecHelper.java:244)
at org.apache.hadoop.hive.ql.exec.mr.HadoopJobExecHelper.progress(HadoopJobExecHelper.java:547)
at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:426)
at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:136)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153)
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85)
at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:72)
This kind of Blocked Status is not technically a Deadlock, but just a thread blocking other threads: locked status for a resource while many other objects start queuing waiting for the same resource.
*** Instead of “waiting to lock” we can search for “parking to wait for“. We can find “waiting to lock” in the thread dump when using intrinsic locks and “parking to wait for” when using locks from java.util.concurrent.
If we have more than one Java version installed on your Linux server (Redhat flavor) you can change defaults using ‘alternatives’ command:
[hadoop@ip-172-31-36-252 ~]$ sudo /usr/sbin/alternatives --config java There are 2 programs which provide 'java'. Selection Command ----------------------------------------------- *+ 1 /usr/lib/jvm/jre-1.7.0-openjdk.x86_64/bin/java 2 /usr/lib/jvm/jre-1.8.0-openjdk.x86_64/bin/java Enter to keep the current selection[+], or type selection number: 2 [hadoop@ip-172-31-36-252 ~]$ sudo /usr/sbin/alternatives --config java There are 2 programs which provide 'java'. Selection Command ----------------------------------------------- * 1 /usr/lib/jvm/jre-1.7.0-openjdk.x86_64/bin/java + 2 /usr/lib/jvm/jre-1.8.0-openjdk.x86_64/bin/java Enter to keep the current selection[+], or type selection number: [hadoop@ip-172-31-36-252 ~]$
Install httpry:
sudo yum install httpry
or
$ sudo yum install gcc make git libpcap-devel $ git clone https://github.com/jbittel/httpry.git $ cd httpry $ make $ sudo make install
then run:
sudo httpry -i eth0
Output will be like:
httpry version 0.1.8 -- HTTP logging and information retrieval tool Copyright (c) 2005-2014 Jason Bittel <jason.bittel@gmail.com> Starting capture on eth0 interface 2016-07-27 14:20:59.598 172.31.43.18 169.254.169.254 > GET 169.254.169.254 /latest/dynamic/instance-identity/document HTTP/1.1 - - 2016-07-27 14:20:59.599 169.254.169.254 172.31.43.18 < - - - HTTP/1.0 200 OK 2016-07-27 14:22:02.034 172.31.43.18 169.254.169.254 > GET 169.254.169.254 /latest/dynamic/instance-identity/document HTTP/1.1 - - 2016-07-27 14:22:02.034 169.254.169.254 172.31.43.18 < - - - HTTP/1.0 200 OK 2016-07-27 14:23:04.640 172.31.43.18 169.254.169.254 > GET 169.254.169.254 /latest/dynamic/instance-identity/document HTTP/1.1 - - 2016-07-27 14:23:04.640 169.254.169.254 172.31.43.18 < - - - HTTP/1.0 200 OK 2016-07-27 14:24:07.122 172.31.43.18 169.254.169.254 > GET 169.254.169.254 /latest/dynamic/instance-identity/document HTTP/1.1 - - 2016-07-27 14:24:07.123 169.254.169.254 172.31.43.18 < - - - HTTP/1.0 200 OK
Show hidden/control characters on vi:
:set list
Hide hidden/control characters on vi:
:set nolist
Replace hidden/control characters on vi:
:%s/^M//g :%s/.$//g
This is more Linux script related, but, sometimes we have a Hadoop (YARN) cluster running and we need to run a post install script or activity that executes on all the nodes in the cluster:
for i in `yarn node --list | cut -f 1 -d ':' | grep "ip"`; do ssh -i your-key.pem hadoop@$i 'hadoop fs -copyToLocal s3://mybucket/myscript.sh | chmod +x /home/hadoop/myscript.sh | /home/hadoop/myscript.sh' ; done
Note: we will need the your-key.pem file in the master node.
If you want to explore how to parallelize the data ingestion into Elasticsearch, please have a look to this post I have written for Amazon AWS:
It explains how to index Common Crawl metadata into Elasticsearch using Cascading connector directly from the S3 data source.
Cascading Source Code is available here.
Ganglia is a scalable distributed monitoring system for high-performance computing systems such as clusters and Grids. It is based on a hierarchical design targeted at federations of clusters. It leverages widely used technologies such as XML for data representation, XDR for compact, portable data transport, and RRDtool for data storage and visualization.
1. Ganglia Monitoring Daemon (gmond)
Gmond stands for ganglia monitoring daemon. It is a lightweight service that is installed on every machine you’d like to monitor.
Gmond has four main responsibilities:
1.1 Monitor changes in host state.
1.2 Announce relevant changes.
1.3 Listen to the state of all other ganglia nodes via a unicast or multicast channel.
1.4 Answer requests for an XML description of the cluster state.
Each gmond transmits information in two different ways:
a. Unicasting or Multicasting host state in external data representation (XDR) format using UDP messages.
b. Sending XML over a TCP connection.
Notes about gmond:
– The main configuration file of gmond is /etc/gmond.conf
– gmond is multithreaded
Test gmond installation:
telnet localhost 8649
You should see XML that conforms to the ganglia XML spec.
Or
gmond -d 5 -c /etc/ganglia/gmond.conf
to see the service in debugging mode.
2. Ganglia Meta Daemon (gmetad)
The ganglia meta daemon (gmetad) is a service that collects data from other gmetad and gmond sources and stores their state to disk in indexed round-robin (RRD) databases. Gmetad provides a simple query mechanism for collecting specific information about groups of machines.
Notes about gmetad:
– the main configuration file for gmetad is /etc/gmetad.conf
– You need atleast one gmetad daemon installed node on each cluster.
– This gemetad daemon is the one who collects data send by gmond daemon.
– All other nodes other than the one in the cluster, do not require gmetad daemon to be installed.
– If you need the machine containing gmetad configured as node to be monitored, then in that case you need to install both gmond and gmetad on the machine.
Test gmetad installation:
telnet localhost 8651
3. Ganglia PHP Web Front-end
The Ganglia web front-end provides a view of the gathered information via real-time dynamic web pages. Most importantly, it displays Ganglia data in a meaningful way for system administrators and computer users using PHP.
In this picture we can see gmond installed in each node and sending data to gmetad installed in a “gmetad node”. We can have one or more “nodes with gmetad” in a cluster.
gmetad collects all the data from gmond and stores it in rrdtool database. Which is then collected by the php scripts, and showed as the first picture in this article.
4. Gmetrics
The ganglia metric tool is a commandline application that you can use to inject custom made metrics about hosts that are being monitored by ganglia. It has the ability to spoof messages as coming from a different host in case you want to capture and report metrics from a device where you don’t have gmond running (like a network or other embedded device).
5. Gstat
The ganglia stat tool is a commandline application that you can use to query a gmond for metric information directly.
6. RRD tool:
Ganglia uses RRD tool to store its data and visualization.
RRD tool is the short form for Round Robin Data base tool. This is a wonderful and useful open source data base tool. In this RRD stores data in time-series. For example RRD tool will store all values of CPU load at a certain time interval and then graph these data according to time.
1) Build the package with the provided pom.xml:
$ mvn package
2) Rebuild the RPM structure:
$ mvn -DskipTests=true rpm:rpm
A structure like the following will be created:
/target/rpm/<app_name>/BUILD /target/rpm/<app_name>/RPMS /target/rpm/<app_name>/SOURCES /target/rpm/<app_name>/SPECS /target/rpm/<app_name>/SRPMS
![]()
If you need to add Elasticsearch and Kibana on EMR, please have a look to this post I have written for Amazon AWS:
It contains all the steps to launch a cluster and perform the basic testings on both tools.
Additionally, here you will find the source code for the bootstrap actions used to configure Elasticsearch and Kibana on the EMR Hadoop cluster:
https://github.com/awslabs/emr-bootstrap-actions/tree/master/elasticsearch

Si necesitas Elasticsearch y Kibana instalado en un cluster EMR, por favor, mira esta publicacion que he escrito para Amazon AWS:
Contiene todos los pasos para crear un cluster y realizar las pruebas basicas en las dos herramientas.
Adicionalmente, aqui encontraras el codigo fuente para las bootstrap actions que uso para instalar Elasticsearch y Kibana en el EMR Hadoop cluster.
https://github.com/awslabs/emr-bootstrap-actions/tree/master/elasticsearch