Secondary NameNode in Hadoop 2


This is a frequent asked question:

In hadoop 2, Secondary Name Node can be implemented in two ways:

1. With HA (High Availability Cluster): if you are setting up HA cluster then you may not need to use Secondary namenode because standby namenode keep its state synchronized with the Active namenode.

The HDFS NameNode High Availability feature enables you to run redundant NameNodes in the same cluster in an Active/Passive configuration with a hot standby.Both NameNode require the same type of hardware configuration.In HA hadoop cluster Active NameNode reads and write metadata information in Separate JournalNode.

In the event of failover, standby NameNode will ensure that its namespace is completely updated according to edit logs before it is changes to active state. So there is no need of Secondary NameNode in this Cluster Setup.

2. Without HA: you can have a hadoop setup without standby node. Then the secondary NameNode will act as you already mentioned in Hadoop 1.x

 

Source: https://stackoverflow.com/questions/37830777/use-of-secondary-namenode-in-hadoop-in-2-x

Adding a mount point to HDFS


Before proceeding:

This procedure considers that you don’t have any current useful data on HDFS. All the data will be lost after adding mount points with this method.

This procedure should be applied to every datanode in the cluster. No intervention in the master node is needed if the framework is configured properly.

#checking available block devices:
[ec2-user@ip-10-0-15-76 media]$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
nvme2n1 259:4 0 2.5T 0 disk
nvme1n1 259:3 0 2.5T 0 disk /media/ebs0
nvme4n1 259:6 0 2.5T 0 disk
nvme0n1 259:0 0 2G 0 disk
├─nvme0n1p1 259:1 0 2G 0 part /
└─nvme0n1p128 259:2 0 1M 0 part
nvme3n1 259:5 0 2.5T 0 disk

#checking formatted filesystem:
[ec2-user@ip-10-0-15-76 media]$ sudo file -s /dev/nvme2n1
/dev/nvme2n1: data

(this filesystem is not formatted)

#formatting to ext4:
[ec2-user@ip-10-0-15-76 media]$ sudo mkfs -t ext4 /dev/nvme2n1
mke2fs 1.42.12 (29-Aug-2014)
Creating filesystem with 655360000 4k blocks and 163840000 inodes
Filesystem UUID: 6d9c997f-d47b-4529-85c8-e56e8ef47a1d
Superblock backups stored on blocks:
32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968,
102400000, 214990848, 512000000, 550731776, 644972544

Allocating group tables: done
Writing inode tables: done
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done

#mounting
[ec2-user@ip-10-0-15-76 media]$ sudo mkdir /media/ebs1
[ec2-user@ip-10-0-15-76 media]$ sudo mount /dev/nvme2n1 /media/ebs1
[ec2-user@ip-10-0-15-76 media]$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
nvme2n1 259:4 0 2.5T 0 disk /media/ebs1
nvme1n1 259:3 0 2.5T 0 disk /media/ebs0
nvme4n1 259:6 0 2.5T 0 disk
nvme0n1 259:0 0 2G 0 disk
├─nvme0n1p1 259:1 0 2G 0 part /
└─nvme0n1p128 259:2 0 1M 0 part
nvme3n1 259:5 0 2.5T 0 disk

#final mount result
[ec2-user@ip-10-0-60-46 ~]$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
nvme2n1 259:4 0 2.5T 0 disk /media/ebs1
nvme1n1 259:3 0 2.5T 0 disk /media/ebs0
nvme4n1 259:6 0 2.5T 0 disk /media/ebs3
nvme0n1 259:0 0 2G 0 disk
├─nvme0n1p1 259:1 0 2G 0 part /
└─nvme0n1p128 259:2 0 1M 0 part
nvme3n1 259:5 0 2.5T 0 disk /media/ebs2

#checking mount points in hdfs-site.xml
[ec2-user@ip-10-0-60-46 media]$ cat /opt/hadoop-2.7.3/etc/hadoop/hdfs-site.xml |grep -A1 dfs.datanode.data.dir
<name>dfs.datanode.data.dir</name>
<value>/media/ebs0/hadoop/datanodes,/media/ebs1/hadoop/datanodes,/media/ebs2/hadoop/datanodes,/media/ebs3/hadoop/datanodes</value>

# create defined directory structure on mount point (for each mount point):
sudo mkdir -p /media/ebs1/hadoop/datanodes

# modify owner to the user that will start DFS (for each mount point):
sudo chown -R ec2-user:ec2-user /media/ebs1/hadoop/datanodes

#format namenode:
hadoop namenode -format

# stop/start DFS:
/opt/hadoop-2.7.3/sbin/stop-dfs.sh
/opt/hadoop-2.7.3/sbin/start-dfs.sh

# check service start status
tail -f /var/log/hadoop/hadoop-ec2-user-datanode-ip-10-0-15-76.log

 

**some ENV variables I usually use on these environments:

export HADOOP_SSH_OPTS="-i /home/ec2-user/.ssh/mykey -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null"
export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.151.x86_64/jre

s3:// vs s3n:// vs s3a:// vs EMRFS


s3://

Apache Hadoop implementation of a block-based filesystem backed by S3. Apache Hadoop has deprecated use of this filesystem as of May 2016.

s3n://

A native filesystem for reading and writing regular files on S3. S3N allows Hadoop to access files on S3 that were written with other tools, and conversely, other tools can access files written to S3N using Hadoop. S3N is stable and widely used, but it is not being updated with any new features. S3N requires a suitable version of the jets3t JAR on the classpath.

  • Uses jets3t

s3a://

Hadoop’s successor to the S3N filesystem. S3A uses Amazon’s libraries to interact with S3. S3A supports accessing files larger than 5 GB, and it provides performance enhancements and other improvements. For Apache Hadoop, S3A is the successor to S3N and is backward compatible with S3N. Using Apache Hadoop, all objects accessible from s3n:// URLs should also be accessible from S3A by replacing the URL scheme.

  • Uses AWS SDK.
  • Amazon EMR does not currently support use of the Apache Hadoop S3A file system.

EMRFS:

On Amazon EMR, both the s3:// and s3n:// URIs are associated with the EMR filesystem and are functionally interchangeable in the context of Amazon EMR. For consistency sake, however, it is recommended to use the s3:// URI in the context of Amazon EMR.

EMRFS can be used by invoking the prefix s3n:// or s3:// or s3a:// depending on the client application implementation.

Source: https://aws.amazon.com/premiumsupport/knowledge-center/emr-file-system-s3/

HDFS: changing the replication factor


The replication factor is a property that can be set in the HDFS configuration file that will allow you to adjust the global replication factor for the entire cluster. For each block stored in HDFS, there will be n – 1 duplicated blocks distributed across the cluster.

File conf/hdfs-site.xml is used to configure HDFS. Changing the dfs.replication property in hdfs-site.xml will change the default replication for all files placed in HDFS.

<name>dfs.replication<name> 
<value>3<value>

You can also change the replication factor on a per-file basis using the Hadoop FS shell.

To set the replication factor to 1 to all the files in a directory, you will need:

hadoop fs -setrep -w 1 -R /my/dir

Hadoop useful commands


– Copy fromLocal/ToLocal from/to S3:

$ bin/hadoop fs -copyToLocal s3://my-bucket/myfile.rb /home/hadoop/myfile.rb
$ bin/hadoop fs -copyFromLocal job5.avro s3://my-bucket/input

– Merge all the files from one folder into one single file:

$ hadoop jar ~/lib/emr-s3distcp-1.0.jar --src s3://my-bucket/my-folder/ --dest s3://my-bucket/logs/all-the-files-merged.log --groupBy '.*(*)' --outputCodec none

– Create directory on HDFS:

$ bin/hadoop fs -mkdir -p /user/ubuntu

– List HDFS directory:

bin/hadoop fs -ls /

– Put a file in HDFS:

bin/hadoop dfs -put localfile.txt /user/hadoop/hadoopfile

– Check HDFS filesystem utilization:

$ bin/hadoop dfsadmin -report

– Cat of file on HDFS:

$ bin/hadoop  dfs -cat /user/ubuntu/RESULTS/part-00000

More commands:

http://hadoop.apache.org/docs/r0.18.3/hdfs_shell.html

Hadoop: HDFS find / recover corrupt blocks


1) Search for files on corrupt files:

A command like ‘hadoop fsck /’ will show the status of the filesystem and any corrupt files. This command will ignore lines with nothing but dots and lines talking about replication:

hadoop fsck / | egrep -v '^\.+$' | grep -v eplica

2) Determine the corrupt blocks:

hadoop fsck /path/to/corrupt/file -locations -blocks -files

(Use that output to determine where blocks might live. If the file is larger than your block size it might have multiple blocks.)

3) Try to copy the files to S3 with s3distcp or s3cmd. If that fails, you will have the option to run:

hadoop fsck -move

which will move what is left of the corrupt blocks into hdfs /lost+found

4) Delete the file:

hadoop fs -rm /path/to/file/with/permanently/missing/blocks

Check file system state again with step 1.

A more drastic command is:

hadoop fsck / -delete

that will search and delete all corrupted files.

Hadoop should not use corrupt blocks again unless the replication factor is low and it does not have enough replicas

References:

http://hadoop.apache.org/docs/r0.19.0/commands_manual.html#fsck

HDFS: Cluster to cluster copy with distcp


Este es el formato del comando distcp para copiar de hdfs a hdfs considerando cluster origen y destino en Amazon AWS:

hadoop distcp "hdfs://ec2-54-86-202-252.compute-1.amazonaws.comec2-2:9000/tmp/test.txt" "hdfs://ec2-54-86-229-249.compute-1.amazonaws.comec2-2:9000/tmp/test1.txt"

Mas informacion sobre distcp:

http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/latest/CDH4-Installation-Guide/cdh4ig_topic_7_2.html
http://hadoop.apache.org/docs/r1.2.1/distcp2.html

 

Arquitectura HDFS


El diseño del sistema de archivos HDFS se basa en el Google File System (GFS).

– Es capaz de almacenar una gran cantidad de datos (terabytes o petabytes).

– Esta diseñado para almacenar los datos a traves de un gran numero de maquinas.

– Implementa replicacion de datos para enfrentar mal funcionamiento o perdida de equipos en el cluster.

– Para mejorar la relacion Hadoop – MapReduce, HDFS permite que los datos sean leidos y procesados localmente.

HDFS_Architecture

Los archivos de entrada se dividen en bloques de un tamaño fijo (64Mb por default), que se almacenan de manera distribuida en un cluster Hadoop. Un archivo puede estar formado por varios bloques, que se almacenan en diferentes DataNodes (máquinas individuales en el cluster) escogidos al azar. Como resultado, el acceso a un archivo por lo general requiere el acceso a múltiples DataNodes, lo que significa que el HDFS soporta tamaños de archivo mucho más grandes que una capacidad de disco de una sola máquina.

El NameNode, almacena toda la metadata del sistema de archivos en el clúster. Esto significa que HDFS implementa una arquitectura maestro / esclavo. Un único NameNode (que es un servidor primario) gestiona el espacio de nombres del sistema de archivos y se regula el acceso a los archivos de los clientes. La existencia de un único maestro en un clúster simplifica en gran medida la arquitectura del sistema, pero tiene como debilidad que es un unico punto de falla (Single Point of Failure). El NameNode sirve como un solo árbitro y repositorio para todos los metadatos HDFS.

Debido a la relativamente baja cantidad de metadata por archivo (sólo controla los nombres de archivo, los permisos y la ubicación de cada bloque), el NameNode almacena todos los metadatos en la memoria principal, lo que permite un rápido acceso aleatorio. Como resultado, un NameNode con 4 GB de RAM es capaz de soportar un gran número de archivos y directorios.

Varios DataNodes son servidores de un unico archivo, lo que significa que un archivo puede estar disponible en caso de que se pierda una de esas máquinasHDFS replica cada bloque a través de una serie de máquinas (tres, de manera predeterminada).

Cada DataNode envía periódicamente un heartbeat al NameNode. El NameNode marca los DataNode que no han enviado su hearbeat durante 10 minutos (default) como muertos y deja de enviar I/O requests a dichos nodos. Alli comienza el proceso de replicacion de los datos que contenia dicho nodo para mantener el replication factor (3 por default).

Si el replication factor es de 3, significa que el dato tiene que estar almacenado en 3 nodos en todo momento.