Listen on a port/Send data to a port

Posted on January 26, 2020 by hvivani

Listening with nc:

nc -l 8089

Checking that 8089 port is listening:

netstat -nap|grep 8089
(Not all processes could be identified, non-owned process info
will not be shown, you would have to be root to see it all.)
tcp 0 0 0.0.0.0:8089 0.0.0.0:* LISTEN 27543/nc
tcp 0 0 127.0.0.1:8089 127.0.0.1:43592 TIME_WAIT -

Sending data to 8089 port:

telnet localhost 8089
Trying 127.0.0.1...
Connected to localhost.localdomain (127.0.0.1).
Escape character is '^]'.
hello

listeningOnAPort_SendingDataToAPort

How to: Move a Linux running process to a screen shell session

Posted on July 17, 2019 by hvivani

Use case:

You just started a process (i.e. compile, copy, etc).
You noticed it will take much longer than expected to finish.
You cannot abort or risk the process to get aborted due to the current shell session finishing.
It would be ideal to have this process on ‘screen’ to have it running on backgroud.

We can move it to a screen session with the following steps:

Suspend the process
1. ```
press Ctrl+Z
```
Resume the process in the background
1. ```
bg
```
Disown the process
1. ```
disown %1
```
Launch a screen session
1. ```
screen
```
Find the PID of the process
1. ```
pgrep myappname
```
Use reptyr to take over the process
1. ```
reptyr 1234
```

Note: at the moment of writing this, reptyr is not available on any Fedore/Redhat repo. We’ll need to compile:

$ git clone https://github.com/nelhage/reptyr.git

$ cd reptyr/

$ make

$ sudo make install

Getting latest EMR release label

Posted on April 22, 2019 by hvivani

Usually latest release label gets updated on EMR’s Whats New page.

So a way to getting the last EMR release label would be:

curl -s https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-whatsnew.html |grep "(Latest)"|head -n1|awk '{ print $3 }'

Have fun!

Update protobuf to 2.5 on Centos 6

Posted on February 27, 2019 by hvivani

Update protobuf to 2.5 on Centos 6:

$ cd /usr/local/src/
$ wget https://github.com/google/protobuf/releases/download/v2.5.0/protobuf-2.5.0.tar.gz
$ tar xvf protobuf-2.5.0.tar.gz
$ cd protobuf-2.5.0
$ ./autogen.sh
$ ./configure --prefix=/usr
$ make
$ make install
$ protoc --version

Install protobuf for java

$ cd java
$ mvn install
$ mvn package

You should be good to go.

To enable you to install different versions of protobuf, install stow then change ./configure --prefix=/usr to ./configure --prefix=/usr/local/stow/protobuf-2.5.0

Then link protobuf into your system with stow:

$ cd /usr/local/stow
$ stow protobuf-2.5.0

Note: stow uses /usr/local/bin by default. Make sure thats in your $PATH

To unlink that version of protobuf,

$ stow -D protobuf-2.5.0

upgrading glibc from version 2.12 to 2.14 on CentOS

Posted on February 27, 2019 by hvivani

You cannot update glibc on Centos 6 safely, but, you can install 2.14 alongside 2.12 easily, then use it to compile projects etc.

mkdir ~/glibc_install; cd ~/glibc_install
wget http://ftp.gnu.org/gnu/glibc/glibc-2.14.tar.gz
tar zxvf glibc-2.14.tar.gz
cd glibc-2.14
mkdir build
cd build
../configure –prefix=/opt/glibc-2.14
make -j4
sudo make install
export LD_LIBRARY_PATH=/opt/glibc-2.14/lib

re-indexing Outlook Spotlight index on Mac

Posted on February 21, 2018 by hvivani

If you search on Outlook and you don’t get results or you get partial results, most probably the Spotlight index is corrupted.

How to fix it:

Restart the Mac, so that it restarts the Spotlight services.
Navigate to Finder > Applications > Utilities > Terminal.
Type mdimport -L.
Important: If you see more than one instance of “Microsoft Outlook Spotlight Importer.mdimporter,” delete the Outlook application that you are not using, empty it from the Trash, restart your Mac, and go back to step 1.
In the Terminal, reindex your Outlook database by using the following command and substituting your own user name for the <user_name> placeholder:
mdimport -g “/Applications/Microsoft Outlook.app/Contents/Library/Spotlight/Microsoft Outlook Spotlight Importer.mdimporter” -d1 “/Users/<user_name>/Library/Group Containers/UBF8T346G9.Office/Outlook/Outlook 15 Profiles/<my_profile_name>”

Note In this command, the path after “-g” is the default path of the Outlook installation. The path after “-d1” is the default path of your profile, where <my_profile_name> is, by default “Main Profile.” You’ll have to substitute your actual paths if you have renamed your profile or installed Outlook in a different location.
Reindexing will take some time to complete. After the process is complete, exit and then restart Outlook.

Have a look to the source of this information for further guidance: https://support.microsoft.com/en-us/help/2741535/outlook-for-mac-search-returns-no-results-and-task-items-are-not-displ

Adding a mount point to HDFS

Posted on October 30, 2017 by hvivani

Before proceeding:

This procedure considers that you don’t have any current useful data on HDFS. All the data will be lost after adding mount points with this method.

This procedure should be applied to every datanode in the cluster. No intervention in the master node is needed if the framework is configured properly.

#checking available block devices:
[ec2-user@ip-10-0-15-76 media]$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
nvme2n1 259:4 0 2.5T 0 disk
nvme1n1 259:3 0 2.5T 0 disk /media/ebs0
nvme4n1 259:6 0 2.5T 0 disk
nvme0n1 259:0 0 2G 0 disk
├─nvme0n1p1 259:1 0 2G 0 part /
└─nvme0n1p128 259:2 0 1M 0 part
nvme3n1 259:5 0 2.5T 0 disk

#checking formatted filesystem:
[ec2-user@ip-10-0-15-76 media]$ sudo file -s /dev/nvme2n1
/dev/nvme2n1: data

(this filesystem is not formatted)

#formatting to ext4:
[ec2-user@ip-10-0-15-76 media]$ sudo mkfs -t ext4 /dev/nvme2n1
mke2fs 1.42.12 (29-Aug-2014)
Creating filesystem with 655360000 4k blocks and 163840000 inodes
Filesystem UUID: 6d9c997f-d47b-4529-85c8-e56e8ef47a1d
Superblock backups stored on blocks:
32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968,
102400000, 214990848, 512000000, 550731776, 644972544

Allocating group tables: done
Writing inode tables: done
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done

#mounting
[ec2-user@ip-10-0-15-76 media]$ sudo mkdir /media/ebs1
[ec2-user@ip-10-0-15-76 media]$ sudo mount /dev/nvme2n1 /media/ebs1
[ec2-user@ip-10-0-15-76 media]$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
nvme2n1 259:4 0 2.5T 0 disk /media/ebs1
nvme1n1 259:3 0 2.5T 0 disk /media/ebs0
nvme4n1 259:6 0 2.5T 0 disk
nvme0n1 259:0 0 2G 0 disk
├─nvme0n1p1 259:1 0 2G 0 part /
└─nvme0n1p128 259:2 0 1M 0 part
nvme3n1 259:5 0 2.5T 0 disk

#final mount result
[ec2-user@ip-10-0-60-46 ~]$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
nvme2n1 259:4 0 2.5T 0 disk /media/ebs1
nvme1n1 259:3 0 2.5T 0 disk /media/ebs0
nvme4n1 259:6 0 2.5T 0 disk /media/ebs3
nvme0n1 259:0 0 2G 0 disk
├─nvme0n1p1 259:1 0 2G 0 part /
└─nvme0n1p128 259:2 0 1M 0 part
nvme3n1 259:5 0 2.5T 0 disk /media/ebs2

#checking mount points in hdfs-site.xml
[ec2-user@ip-10-0-60-46 media]$ cat /opt/hadoop-2.7.3/etc/hadoop/hdfs-site.xml |grep -A1 dfs.datanode.data.dir
<name>dfs.datanode.data.dir</name>
<value>/media/ebs0/hadoop/datanodes,/media/ebs1/hadoop/datanodes,/media/ebs2/hadoop/datanodes,/media/ebs3/hadoop/datanodes</value>

# create defined directory structure on mount point (for each mount point):
sudo mkdir -p /media/ebs1/hadoop/datanodes

# modify owner to the user that will start DFS (for each mount point):
sudo chown -R ec2-user:ec2-user /media/ebs1/hadoop/datanodes

#format namenode:
hadoop namenode -format

# stop/start DFS:
/opt/hadoop-2.7.3/sbin/stop-dfs.sh
/opt/hadoop-2.7.3/sbin/start-dfs.sh

# check service start status
tail -f /var/log/hadoop/hadoop-ec2-user-datanode-ip-10-0-15-76.log

**some ENV variables I usually use on these environments:

export HADOOP_SSH_OPTS="-i /home/ec2-user/.ssh/mykey -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null"
export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.151.x86_64/jre

AWS EMR – Big Data in Strata New York

Posted on September 23, 2017 by hvivani

Will you be in New York next week (Sept 25th – Sept 28th)?

aws_sponsor

Come meet the AWS Big Data team at Strata Data Conference, where we’ll be happy to answer your questions, hear about your requirements, and help you with your big data initiatives.

See you there!

Kill’em All!

Posted on April 17, 2017 by hvivani

Use it at your own discretion:

for app in `yarn application -list | awk '$6 == "ACCEPTED" { print $1 }'` ; do yarn application -kill "$app";done

HBase and Zookeeper debugging

Posted on January 31, 2017 by hvivani

I came across some scenarios where an application (i.e. Mapreduce) communicating to HBase through YARN could silently fail with a timeout like the following:

2017-01-30 19:42:03,657 DEBUG [main] org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation: locateRegionInMeta parentTable=hbase:meta, metaLocation=, attempt=9 of 35 failed; retrying after sleep of 10095 because: Failed after attempts=36, exceptions:
Mon Jan 30 19:42:03 UTC 2017, null, java.net.SocketTimeoutException: callTimeout=60000, callDuration=68463: row 'test2,#cmrNo acctNo,99999999999999' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=ip-172-31-3-246.us-west-2.compute.internal,16000,1485539268192, seqNum=0

The root cause for this behavior here wasn’t related to any missconfiguration at server/networking side, but a library missing in the class path.

When there is a zookeeper issue, depending on the retry parameters the exceptions are not visible.

On this case, In the Mapreduce Java application I’ve added/modified the following parameters that lead into more visibility in the communication layer between Zookeeper and HBase:

conf.set("hbase.client.retries.number", Integer.toString(1));
conf.set("zookeeper.session.timeout", Integer.toString(60000));
conf.set("zookeeper.recovery.retry", Integer.toString(1));

After this, the following exception was visible:

Exception: java.lang.NoClassDefFoundError: com/yammer/metrics/core/Gauge
 at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:157)
 at org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture.run(ResultBoundedCompletionService.java:65)
 at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: com.google.protobuf.ServiceException: java.lang.NoClassDefFoundError: com/yammer/metrics/core/Gauge

Playing around these parameters will cause the application exit quickly when there is a problem with the cluster. This can be desirable in a production environment.

Reducing the parameters to a more conservative value could yield better recovery times. Setting zookeeper.recovery.retry to 0 will still result in up to two connection attempts made to all zk servers in the quorum and cause and application failure to happen in under a minute should there be a loss of zookeeper connectivity during execution.

As an additional note, if you are receiving timeouts because the application is trying to contact localhost instead of the quorum server, you can set the explicit parameters:

// HBase through MR on Yarn is trying to connect to localhost instead of quorum.
        conf.set("hbase.zookeeper.quorum","172.31.3.246");
        conf.set("hbase.zookeeper.property.clientPort","2181");

I’ve added a couple of examples of Mapreduce applications for HBase here: https://github.com/hvivani/bigdata/tree/master/hbase

Some additional notes on this behavior: https://discuss.pivotal.io/hc/en-us/articles/200933006-Hbase-application-hangs-indefinitely-connecting-to-zookeeper

Hernan Vivani's Blog

Linux, Big Data, AWS, Astronomy, Running, Cycling… and more

Tag Archives: Linux