vi sudo save with root permissions / grabar cambios con permisos de root


Just:

:w !sudo tee %

% is current file.

!sudo tee calls tee with administrator privileges and writes to current file.  But not vi buffered file.

That’s why you will see a warning like this when using the command:

W12: Warning: File "/etc/myfile.txt" has changed and the buffer was changed in Vim as well

Thanks Mandus for this! I feel better now !

 

 

MapReduce: Compression and Input Splits


This is something that always rise doubts:

When considering compressed data that will be processed by MapReduce, it is important to check if the compression format supports splitting. If not, the number of map tasks may not be the expected.

Let’s suppose an uncompressed file stored in HDFS whose size is 1 GB: With a HDFS block size of 64 MB, the file will be stored as 16 blocks, and a MapReduce job using this file as input will create 16 input splits, each processed independently as input to a separate map task.

Now if the file is a gzip-compressed file whose compressed size is 1 GB: As before, HDFS will store the file as 16 blocks. But, creating a split for each block will not work since it is impossible to start reading at an arbitrary point in the gzip stream, and therefore impossible for a map task to read its split independently of the others.

In this case, MapReduce will not try to split the gzipped file, since it knows that the input is gzip-compressed (by looking at the filename extension) and that gzip does not support splitting.

At this scenario a single map will process the 16 HDFS blocks, most of which will not be local to the map (it will have additionally a data locality cost).

This Job, will not parallelize as expected, it will be less granular, and so may take longer to run.

The gzip format uses DEFLATE to store the compressed data, and DEFLATE stores data as a series of compressed blocks. The problem is that the start of each block is not distinguished in any way that would allow a reader positioned at an arbitrary point in the stream to advance to the beginning of the next block, thereby synchronizing itself with the stream. For this reason, gzip does not support splitting.

Here we have a summary of compression formats:

hadoop_spplitable_formats(a)  DEFLATE is a compression algorithm whose standard implementation is zlib. There is no commonly available command-line tool for producing files in DEFLATE format, as gzip is normally used. (Note that the gzip file format is DEFLATE with extra headers and a footer.) The .deflate filename extension is a Hadoop convention.

Source: Hadoop The Definitive Guide.

 

yarn: change configuration and restart node manager on a live cluster


This procedure is to change Yarn configuration on a live cluster, propagate the changes to all the nodes and restart Yarn node manager.

Both commands are listing all the nodes on the cluster and then filtering the DNS name to execute a remote command via SSH. You can customize the sed filter depending on your own needs. This is filtering DNS names with Elastic Mapreduce format (ip-xx-xx-xx-xx.eu-west-1.compute.internal).

1. Upload the private key (.pem) file you are using to access the master node on the cluster. Change the private key permissions to at least 600 (i.e chmod 600 MyKeyName.pem)

2.  Change /conf/yarn-site.xml and use a command like this to populate the change across the cluster.

yarn node -list|sed -n "s/^\(ip[^:]*\):.*/\1/p" | xargs -t -I{} -P10 scp -o StrictHostKeyChecking=no -i ~/MyKeyName.pem ~/conf/yarn-site.xml hadoop@{}://home/hadoop/conf/

3. This command will restart Yarn Node Resource manager on all the nodes.

 yarn node -list|sed -n "s/^\(ip[^:]*\):.*/\1/p" | xargs -t -I{} -P10 ssh -o StrictHostKeyChecking=no -i ~/MyKeyName.pem hadoop@{} "yarn nodemanager stop"

 

Hadoop 1 vs Hadoop 2 – How many slots do I have per node ?


This is a topic that always rise a discussion…

In Hadoop 1, the number of tasks launched per node was specified via the settings mapred.map.tasks.maximum and mapred.reduce.tasks.maximum.

But this is ignored when set on Hadoop 2.

In Hadoop 2 with YARN, we can determine how many concurrent tasks are launched per node by dividing the resources allocated to YARN by the resources allocated to each MapReduce task, and taking the minimum of the two types of resources (memory and CPU).

This approach is an improvement over that of Hadoop 1, because the administrator no longer has to bundle CPU and memory into a Hadoop-specific concept of a “slot”.

The number of tasks that will be spawned per node:

min(
    yarn.nodemanager.resource.memory-mb / mapreduce.[map|reduce].memory.mb
    ,
    yarn.nodemanager.resource.cpu-vcores / mapreduce.[map|reduce].cpu.vcores
    )

Obtained value will be set on the variable ‘mapreduce.job.maps‘ on the ‘mapred-site.xml‘ file.

Of course, YARN is more dynamic than that, and each job can have unique resource requirements — so in a multitenant cluster with different types of jobs running, the calculation isn’t as straightforward.

More information:
http://blog.cloudera.com/blog/2014/04/apache-hadoop-yarn-avoiding-6-time-consuming-gotchas/

Testing Java Cryptography Extension (JCE) is installed


If JCE is already installed, you should see on that the jar files ‘local_policy.jar’ and ‘US_export_policy.jar’ are on $JAVA_HOME/jre/lib/security/

But, we can test it:

import javax.crypto.Cipher;
import java.security.*;
import javax.crypto.*;

class TestJCE {
 public static void main(String[] args) {
 boolean JCESupported = false;
 try {
    KeyGenerator kgen = KeyGenerator.getInstance("AES", "SunJCE");
    kgen.init(256);
    JCESupported = true;
 } catch (NoSuchAlgorithmException e) {
    JCESupported = false;
 } catch (NoSuchProviderException e) {
    JCESupported = false;
 }
    System.out.println("JCE Supported=" + JCESupported);
 }
} 

To compile (assuming file name is TestJCE.java):

$ javac TestJCE.java

Previous command will create TestJCE.class output file.

To Interpreting and Running the program:

$ java TestJCE

 

Actualizar OpenSSL / Update to 1.0.1g


Actualizar OpenSSL a la utilma version en tres pasos:

1) compilamos e instalamos la ultima version de openssl version:
$ sudo curl https://www.openssl.org/source/openssl-1.0.1g.tar.gz | tar xz && cd openssl-1.0.1g && sudo ./config && sudo make && sudo make install_sw

2) Reemplazamos la vieja libreria openssl por la nueva con un link simbolico
$ sudo ln -sf /usr/local/ssl/bin/openssl `which openssl`

3) Probamos:

$ openssl version

Deberia devolver:

OpenSSL 1.0.1g

 

Stress Test: Bees With Machine Guns !


Hace unos días probé una herramienta sumamente interesante: Bees With Machine Guns !!

Esta es una herramienta para realizar pruebas de stress sobre los servicios Load Balancer y Autoscaling de Amazon AWS.

Luego de armar nuestra estructura de servidores, podremos generar una prueba de carga con las abejas. De esta manera, veremos actuar al servicio de Autoscaling creando nuevas instancias de nuestro servidor o decrementando las instancias si la carga disminuye.

Instalación:

$ git clone git://github.com/newsapps/beeswithmachineguns.git
$ easy_install beeswithmachineguns

Creamos archivo de Credenciales:

[Credentials]
aws_access_key_id = <your access key>
aws_secret_access_key = <your secret key>

Estas credenciales deben colocarse en el archivo .boto en nuestro home. Conteniendo la key y secret key que utilicemos en nuestra cuenta de Amazon AWS. Estas credenciales serán utilizadas por la aplicacion para crear las abejas, que no son otra cosa que instancias EC2.

Utilización:

bees up -s 4 -g public-sg -k hvivani-virg-1
bees attack -n 10000 -c 250 -u http://loadbalancer.hvivani.com/
bees down

La primera linea crea 4 abejas (instancias ec2) utilizando las credenciales del archivo .boto junto con los permisos seteados en ‘public-sg’ (Security Group definido en la región) y la key ‘hvivani-virg-1’ (llave privada utilizada para conectar a cualquier instancia en la región).

La segunda linea llama a las 4 abejas a atacar el sitio http://loadbalancer.hvivani.com/ con 10000 solicitudes de a 250 cada vez.

La ultima linea elimina las abejas (termina las instancias de ataque).

A jugar …

Ejecución de comandos remotos con sudo / Execute remote commands with sudo


Hace unos días necesitaba ejecutar un par de comandos en un servidor remoto, para lo cuál tenemos una sintáxis como esta:

$ ssh -p66 hvivani@server "cd /home/hvivani/backup/; ls -l"

Vean que separamos los comandos que queremos ejecutar con “;”

Ahora bien, que pasa si necesito ejecutar algo así ?

$ ssh -p66 hvivani@server "cd /etc;sudo vi sudoers"

Obtendremos el siguiente error:

hvivani@server's password: 
sudo: sorry, you must have a tty to run sudo

Para ejecutar comandos remotos con sudo por ssh, deberemos utilizar el parámetro “-t” que creará una pseudo terminal tty para permitirnos la ejecución:

$ ssh -t -p66 hvivani@server "cd /etc;sudo vi sudoers"