Java change default version / cambiar la version Java por defecto

Posted on August 5, 2016 by hvivani

If we have more than one Java version installed on your Linux server (Redhat flavor) you can change defaults using ‘alternatives’ command:

[hadoop@ip-172-31-36-252 ~]$ sudo /usr/sbin/alternatives --config java

There are 2 programs which provide 'java'.

  Selection    Command
-----------------------------------------------
*+ 1           /usr/lib/jvm/jre-1.7.0-openjdk.x86_64/bin/java
   2           /usr/lib/jvm/jre-1.8.0-openjdk.x86_64/bin/java

Enter to keep the current selection[+], or type selection number: 2
[hadoop@ip-172-31-36-252 ~]$ sudo /usr/sbin/alternatives --config java

There are 2 programs which provide 'java'.

  Selection    Command
-----------------------------------------------
*  1           /usr/lib/jvm/jre-1.7.0-openjdk.x86_64/bin/java
 + 2           /usr/lib/jvm/jre-1.8.0-openjdk.x86_64/bin/java

Enter to keep the current selection[+], or type selection number:
[hadoop@ip-172-31-36-252 ~]$

monitoring HTTP requests on the fly

Posted on July 27, 2016 by hvivani

Install httpry:

sudo yum install httpry

$ sudo yum install gcc make git libpcap-devel
$ git clone https://github.com/jbittel/httpry.git
$ cd httpry
$ make
$ sudo make install

then run:

sudo httpry -i eth0

Output will be like:

httpry version 0.1.8 -- HTTP logging and information retrieval tool
Copyright (c) 2005-2014 Jason Bittel <jason.bittel@gmail.com>
Starting capture on eth0 interface
2016-07-27 14:20:59.598    172.31.43.18    169.254.169.254    >    GET    169.254.169.254    /latest/dynamic/instance-identity/document    HTTP/1.1    -    -
2016-07-27 14:20:59.599    169.254.169.254    172.31.43.18    <    -    -    -    HTTP/1.0    200    OK
2016-07-27 14:22:02.034    172.31.43.18    169.254.169.254    >    GET    169.254.169.254    /latest/dynamic/instance-identity/document    HTTP/1.1    -    -
2016-07-27 14:22:02.034    169.254.169.254    172.31.43.18    <    -    -    -    HTTP/1.0    200    OK
2016-07-27 14:23:04.640    172.31.43.18    169.254.169.254    >    GET    169.254.169.254    /latest/dynamic/instance-identity/document    HTTP/1.1    -    -
2016-07-27 14:23:04.640    169.254.169.254    172.31.43.18    <    -    -    -    HTTP/1.0    200    OK
2016-07-27 14:24:07.122    172.31.43.18    169.254.169.254    >    GET    169.254.169.254    /latest/dynamic/instance-identity/document    HTTP/1.1    -    -
2016-07-27 14:24:07.123    169.254.169.254    172.31.43.18    <    -    -    -    HTTP/1.0    200    OK

Control Characters on vi Linux editor

Posted on July 25, 2016 by hvivani

Show hidden/control characters on vi:

:set list

Hide hidden/control characters on vi:

:set nolist

Replace hidden/control characters on vi:

:%s/^M//g

:%s/.$//g

Indexing Common Crawl Metadata on Elasticsearch using Cascading

Posted on May 28, 2015 by hvivani

If you want to explore how to parallelize the data ingestion into Elasticsearch, please have a look to this post I have written for Amazon AWS:

http://blogs.aws.amazon.com/bigdata/post/TxC0CXZ3RPPK7O/Indexing-Common-Crawl-Metadata-on-Amazon-EMR-Using-Cascading-and-Elasticsearch

It explains how to index Common Crawl metadata into Elasticsearch using Cascading connector directly from the S3 data source.

Cascading Source Code is available here.

Back to the basics: Creating a SPEC file from a Maven project

Posted on May 5, 2015 by hvivani

1) Build the package with the provided pom.xml:

$ mvn package

2) Rebuild the RPM structure:

$ mvn -DskipTests=true rpm:rpm

A structure like the following will be created:

/target/rpm/<app_name>/BUILD
/target/rpm/<app_name>/RPMS
/target/rpm/<app_name>/SOURCES
/target/rpm/<app_name>/SPECS
/target/rpm/<app_name>/SRPMS

Elasticsearch and Kibana on EMR Hadoop cluster

Posted on January 31, 2015 by hvivani

If you need to add Elasticsearch and Kibana on EMR, please have a look to this post I have written for Amazon AWS:

http://blogs.aws.amazon.com/bigdata/post/Tx1E8WC98K4TB7T/Getting-Started-with-Elasticsearch-and-Kibana-on-Amazon-EMR

It contains all the steps to launch a cluster and perform the basic testings on both tools.

Additionally, here you will find the source code for the bootstrap actions used to configure Elasticsearch and Kibana on the EMR Hadoop cluster:

https://github.com/awslabs/emr-bootstrap-actions/tree/master/elasticsearch

Versión en Español

Si necesitas Elasticsearch y Kibana instalado en un cluster EMR, por favor, mira esta publicacion que he escrito para Amazon AWS:

http://blogs.aws.amazon.com/bigdata/post/Tx1E8WC98K4TB7T/Getting-Started-with-Elasticsearch-and-Kibana-on-Amazon-EMR

Contiene todos los pasos para crear un cluster y realizar las pruebas basicas en las dos herramientas.

Adicionalmente, aqui encontraras el codigo fuente para las bootstrap actions que uso para instalar Elasticsearch y Kibana en el EMR Hadoop cluster.

https://github.com/awslabs/emr-bootstrap-actions/tree/master/elasticsearch

Create a really big file / Crear un archivo realmente grande

Posted on December 25, 2014 by hvivani

This is sometimes useful when playing with bigdata.

Instead of a dd command and wait the file being created block by clock, we can run:

$ fallocate -l 200G /mnt/reallyBigFile.csv

It essentially “allocates” all of the space you’re seeking, but it doesn’t bother to write anything. So, when you use fallocate to create a 200 GB virtual drive space, you really do get a 200 GB file.

Instalando Maven en instancia Amazon EC2

Posted on December 16, 2014 by hvivani

Maven es una herramienta de software para la gestión y construcción de proyectos Java

Obtenemos maven:

$ wget http://apache.saix.net/maven/maven-3/3.2.3/binaries/apache-maven-3.2.3-bin.tar.gz

Descomprimimos:

$ tar -xzvf apache-maven-3.2.3-bin.tar.gz

Movemos la carpeta a un directorio de instalación permanente:

$ sudo mv /home/ec2-user/apache-maven-3.2.3 /usr/local/maven

Creamos link simbólico a la version current (por si instalamos otras versiones):

$ sudo ln -s /usr/local/maven /usr/local/maven/current

Modificamos el inicio de sesión del usuario para hacer disponible maven:

$ sudo vi ~/.bashrc

Añadimos las siguientes entradas:

export MAVEN_HOME=/usr/local/maven/current
export M2_HOME=/usr/local/maven/current
export M2=/usr/local/maven/current/bin
export PATH=/usr/local/maven/current/bin:$PATH

Cargamos el bash profile con esta modifiación:

$ source ~/.bashrc

Chequeamos la instalación:

$ mvn -version
Apache Maven 3.2.1 (ea8b2b07643dbb1b84b6d16e1f08391b666bc1e9;2014-02-14T17:37:52+00:00)
Maven home: /usr/local/maven/current
Java version: 1.7.0_55, vendor: Oracle Corporation
Java home: /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.55.x86_64/jre

Apache Hive: dealing with Out of Memory and Garbage Collector errors.

Posted on December 6, 2014 by hvivani

This is the common error:

java.lang.OutOfMemoryError: GC overhead limit exceeded

This error will occur in several Java environments, but, in particular, with Hive, is pretty common when big structures or several thousands objects are stored in memory.

According to Sun, the error will raise if too much time is being spent in garbage collection:

If more than 98% of the total time is spent in garbage collection and less than 2% of the heap is recovered, an OutOfMemoryError will be thrown.

This feature is designed to prevent applications from running for an extended period of time while making little or no progress because the heap is too small.

If necessary, this feature can be disabled by adding the option -XX:-UseGCOverheadLimit.

Also, we can increase the Heap Size, via “-Xmx1024m” option.

Another interesting option is the Concurrent Collector “UseConcMarkSweepGC“: It performs most of its work concurrently (i.e., while the application is still running) to keep garbage collection pauses short.

It is designed for applications with medium to large-sized data sets for which response time is more important than overall throughput, since the techniques used to minimize pauses can reduce application performance.

In Hive, we can set thes parameters with a command like this:

SET mapred.child.java.opts="-server -Xmx1g -XX:+UseConcMarkSweepGC -XX:-UseGCOverheadLimit";

An alternative to avoid storing all the structure in memory:

Write intermediate results to a temporary table in the database instead of hashmaps, a database table table is not memory bound, so use an indexed table is a solution in many cases.

When the intermediate table is complete, execute a sql statement(s) from it instead of from memory.

Mandus Momberg’s Blog ! – the beauty of BASH

Posted on December 2, 2014 by hvivani

I would like to share with you a new awesome blog from an awesome professional:

http://blog.mandusmomberg.com/

And… as a first post, a nice one, about the beauty of BASH:

http://blog.mandusmomberg.com/blog/2014/12/01/o-what-a-beautiful-bashing/

Enjoy !

Hernan Vivani's Blog

Linux, Big Data, AWS, Astronomy, Running, Cycling… and more

Tag Archives: fedora