How to: Move a Linux running process to a screen shell session


Use case:

  • You just started a process (i.e. compile, copy, etc).
  • You noticed it will take much longer than expected to finish.
  • You cannot abort or risk the process to get aborted due to the current shell session finishing.
  • It would be ideal to have this process on ‘screen’ to have it running on backgroud.

We can move it to a screen session with the following steps:

  1. Suspend the process
    1. press Ctrl+Z
  2. Resume the process in the background
    1. bg
  3. Disown the process
    1. disown %1
  4. Launch a screen session
    1. screen
  5. Find the PID of the process
    1. pgrep myappname
  6. Use reptyr to take over the process
    1. reptyr 1234

 

Note: at the moment of writing this, reptyr is not available on any Fedore/Redhat repo. We’ll need to compile:

$ git clone https://github.com/nelhage/reptyr.git
$ cd reptyr/
$ make
$ sudo make install

 

 

 

 

Getting latest EMR release label


Usually latest release label gets updated on EMR’s Whats New page.

So a way to getting the last EMR release label would be:

 

curl -s https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-whatsnew.html |grep "(Latest)"|head -n1|awk '{ print $3 }'

 

Have fun!

 

 

Update protobuf to 2.5 on Centos 6


Update protobuf to 2.5 on Centos 6:

$ cd /usr/local/src/
$ wget https://github.com/google/protobuf/releases/download/v2.5.0/protobuf-2.5.0.tar.gz
$ tar xvf protobuf-2.5.0.tar.gz
$ cd protobuf-2.5.0
$ ./autogen.sh
$ ./configure --prefix=/usr
$ make
$ make install
$ protoc --version

Install protobuf for java

$ cd java
$ mvn install
$ mvn package

You should be good to go.

To enable you to install different versions of protobuf, install stow then change ./configure --prefix=/usr to ./configure --prefix=/usr/local/stow/protobuf-2.5.0

Then link protobuf into your system with stow:

$ cd /usr/local/stow
$ stow protobuf-2.5.0

Note: stow uses /usr/local/bin by default. Make sure thats in your $PATH

To unlink that version of protobuf,

$ stow -D protobuf-2.5.0

upgrading glibc from version 2.12 to 2.14 on CentOS


You cannot update glibc on Centos 6 safely, but, you can install 2.14 alongside 2.12 easily, then use it to compile projects etc.

  1. mkdir ~/glibc_install; cd ~/glibc_install
  2. wget http://ftp.gnu.org/gnu/glibc/glibc-2.14.tar.gz
  3. tar zxvf glibc-2.14.tar.gz
  4. cd glibc-2.14
  5. mkdir build
  6. cd build
  7. ../configure –prefix=/opt/glibc-2.14
  8. make -j4
  9. sudo make install
  10. export LD_LIBRARY_PATH=/opt/glibc-2.14/lib

 

re-indexing Outlook Spotlight index on Mac


If you search on Outlook and you don’t get results or you get partial results, most probably the Spotlight index is corrupted.

How to fix it:

  1. Restart the Mac, so that it restarts the Spotlight services.
  2. Navigate to Finder > ApplicationsUtilities > Terminal.
  3. Type mdimport -L.
  4. Important: If you see more than one instance of “Microsoft Outlook Spotlight Importer.mdimporter,” delete the Outlook application that you are not using, empty it from the Trash, restart your Mac, and go back to step 1.
  5. In the Terminal, reindex your Outlook database by using the following command and substituting your own user name for the <user_name> placeholder:

    mdimport -g “/Applications/Microsoft Outlook.app/Contents/Library/Spotlight/Microsoft Outlook Spotlight Importer.mdimporter” -d1 “/Users/<user_name>/Library/Group Containers/UBF8T346G9.Office/Outlook/Outlook 15 Profiles/<my_profile_name>”

    Note In this command, the path after “-g” is the default path of the Outlook installation. The path after “-d1” is the default path of your profile, where <my_profile_name> is, by default “Main Profile.” You’ll have to substitute your actual paths if you have renamed your profile or installed Outlook in a different location.

  6. Reindexing will take some time to complete. After the process is complete, exit and then restart Outlook.

Have a look to the source of this information for further guidance: https://support.microsoft.com/en-us/help/2741535/outlook-for-mac-search-returns-no-results-and-task-items-are-not-displ

 

 

Adding a mount point to HDFS


Before proceeding:

This procedure considers that you don’t have any current useful data on HDFS. All the data will be lost after adding mount points with this method.

This procedure should be applied to every datanode in the cluster. No intervention in the master node is needed if the framework is configured properly.

#checking available block devices:
[ec2-user@ip-10-0-15-76 media]$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
nvme2n1 259:4 0 2.5T 0 disk
nvme1n1 259:3 0 2.5T 0 disk /media/ebs0
nvme4n1 259:6 0 2.5T 0 disk
nvme0n1 259:0 0 2G 0 disk
├─nvme0n1p1 259:1 0 2G 0 part /
└─nvme0n1p128 259:2 0 1M 0 part
nvme3n1 259:5 0 2.5T 0 disk

#checking formatted filesystem:
[ec2-user@ip-10-0-15-76 media]$ sudo file -s /dev/nvme2n1
/dev/nvme2n1: data

(this filesystem is not formatted)

#formatting to ext4:
[ec2-user@ip-10-0-15-76 media]$ sudo mkfs -t ext4 /dev/nvme2n1
mke2fs 1.42.12 (29-Aug-2014)
Creating filesystem with 655360000 4k blocks and 163840000 inodes
Filesystem UUID: 6d9c997f-d47b-4529-85c8-e56e8ef47a1d
Superblock backups stored on blocks:
32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968,
102400000, 214990848, 512000000, 550731776, 644972544

Allocating group tables: done
Writing inode tables: done
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done

#mounting
[ec2-user@ip-10-0-15-76 media]$ sudo mkdir /media/ebs1
[ec2-user@ip-10-0-15-76 media]$ sudo mount /dev/nvme2n1 /media/ebs1
[ec2-user@ip-10-0-15-76 media]$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
nvme2n1 259:4 0 2.5T 0 disk /media/ebs1
nvme1n1 259:3 0 2.5T 0 disk /media/ebs0
nvme4n1 259:6 0 2.5T 0 disk
nvme0n1 259:0 0 2G 0 disk
├─nvme0n1p1 259:1 0 2G 0 part /
└─nvme0n1p128 259:2 0 1M 0 part
nvme3n1 259:5 0 2.5T 0 disk

#final mount result
[ec2-user@ip-10-0-60-46 ~]$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
nvme2n1 259:4 0 2.5T 0 disk /media/ebs1
nvme1n1 259:3 0 2.5T 0 disk /media/ebs0
nvme4n1 259:6 0 2.5T 0 disk /media/ebs3
nvme0n1 259:0 0 2G 0 disk
├─nvme0n1p1 259:1 0 2G 0 part /
└─nvme0n1p128 259:2 0 1M 0 part
nvme3n1 259:5 0 2.5T 0 disk /media/ebs2

#checking mount points in hdfs-site.xml
[ec2-user@ip-10-0-60-46 media]$ cat /opt/hadoop-2.7.3/etc/hadoop/hdfs-site.xml |grep -A1 dfs.datanode.data.dir
<name>dfs.datanode.data.dir</name>
<value>/media/ebs0/hadoop/datanodes,/media/ebs1/hadoop/datanodes,/media/ebs2/hadoop/datanodes,/media/ebs3/hadoop/datanodes</value>

# create defined directory structure on mount point (for each mount point):
sudo mkdir -p /media/ebs1/hadoop/datanodes

# modify owner to the user that will start DFS (for each mount point):
sudo chown -R ec2-user:ec2-user /media/ebs1/hadoop/datanodes

#format namenode:
hadoop namenode -format

# stop/start DFS:
/opt/hadoop-2.7.3/sbin/stop-dfs.sh
/opt/hadoop-2.7.3/sbin/start-dfs.sh

# check service start status
tail -f /var/log/hadoop/hadoop-ec2-user-datanode-ip-10-0-15-76.log

 

**some ENV variables I usually use on these environments:

export HADOOP_SSH_OPTS="-i /home/ec2-user/.ssh/mykey -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null"
export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.151.x86_64/jre

HBase and Zookeeper debugging


I came across some scenarios where an application (i.e. Mapreduce) communicating to HBase through YARN could silently fail with a timeout like the following:

2017-01-30 19:42:03,657 DEBUG [main] org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation: locateRegionInMeta parentTable=hbase:meta, metaLocation=, attempt=9 of 35 failed; retrying after sleep of 10095 because: Failed after attempts=36, exceptions:
Mon Jan 30 19:42:03 UTC 2017, null, java.net.SocketTimeoutException: callTimeout=60000, callDuration=68463: row 'test2,#cmrNo acctNo,99999999999999' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=ip-172-31-3-246.us-west-2.compute.internal,16000,1485539268192, seqNum=0

The root cause for this behavior here wasn’t related to any missconfiguration at server/networking side, but a library missing in the class path.

When there is a zookeeper issue, depending on the retry parameters the exceptions are not visible.

On this case, In the Mapreduce Java application I’ve added/modified the following parameters that lead into more visibility in the communication layer between Zookeeper and HBase:

conf.set("hbase.client.retries.number", Integer.toString(1));
conf.set("zookeeper.session.timeout", Integer.toString(60000));
conf.set("zookeeper.recovery.retry", Integer.toString(1));


After this, the following exception was visible:

Exception: java.lang.NoClassDefFoundError: com/yammer/metrics/core/Gauge
 at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:157)
 at org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture.run(ResultBoundedCompletionService.java:65)
 at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: com.google.protobuf.ServiceException: java.lang.NoClassDefFoundError: com/yammer/metrics/core/Gauge

 

Playing around these parameters will cause the application exit quickly when there is a problem with the cluster. This can be desirable in a production environment.

Reducing the parameters to a more conservative value could yield better recovery times.  Setting zookeeper.recovery.retry to 0 will still result in up to two connection attempts made to all zk servers in the quorum and cause and application failure to happen in under a minute should there be a loss of zookeeper connectivity during execution.

 

As an additional note, if you are receiving timeouts because the application is trying to contact localhost instead of the quorum server, you can set the explicit parameters:

// HBase through MR on Yarn is trying to connect to localhost instead of quorum.
        conf.set("hbase.zookeeper.quorum","172.31.3.246");
        conf.set("hbase.zookeeper.property.clientPort","2181");

 

I’ve added a couple of examples of Mapreduce applications for HBase here: https://github.com/hvivani/bigdata/tree/master/hbase

 

Some additional notes on this behavior: https://discuss.pivotal.io/hc/en-us/articles/200933006-Hbase-application-hangs-indefinitely-connecting-to-zookeeper