List AWS Security Groups open to 0.0.0.0/0

Posted on October 19, 2022 by hvivani

We all now that opening security groups to the world is a bad practice, right? Maybe we can audit these running some queries like the following:

Inbound:

$ aws ec2 –region us-west-2 describe-security-groups –filter Name=ip-permission.cidr,Values=’0.0.0.0/0′ –query “SecurityGroups[*].{Name:GroupName,ID:GroupId}” –output table

| DescribeSecurityGroups |
+-----------------------+-------------------+
| ID | Name |
+-----------------------+-------------------+
| sg-0b977c7003c7b280 | launch-wizard-2 |
+-----------------------+-------------------+

Outbound:

$ aws ec2 –region us-west-2 describe-security-groups –filter Name=egress.ip-permission.cidr,Values=’0.0.0.0/0′ –query “SecurityGroups[*].{Name:GroupName,ID:GroupId}” –output table

| DescribeSecurityGroups |
+-----------------------+-------------------------------------------------+
| ID | Name |
+-----------------------+-------------------------------------------------+
| sg-61de04 | vivXXXX_XX_US_West |
| sg-9eefe3 | CentOS 6  |
| sg-9479a0 | default |
| sg-e6b0a4 | ElasticMapReduce-master |
| sg-077c79703c7b280 | launch-wizard-2 |
| sg-b24375 | default |
+-----------------------+-------------------------------------------------+

EBS Storage Performance Notes – Instance throughput vs Volume throughput

Posted on June 9, 2022 by hvivani

I just wanted to write a couple lines/guidance on this regard as this is a recurring question when configuring storage, not only in the cloud, but can also happen on bare metal servers.

What is throughput on a volume?

Throughput is the measure of the amount of data transferred from/to a storage device per time unit (typically seconds).

The throughput consumed on a volume is calculated using this formula:

IOPS (IO Ops per second) x BS (block size)= Throughput

As example, if we are writing at 1200 Ops/Sec, and the chunk write size is around 125Kb, we will have a total throughput of about 150Mb/sec.

Why is this important?

This is important because we have to be aware of the Maximum Total Throughput Capacity for a specific volume vs the Maximum Total Instance Throughput.

Because, if your instance type (or server) is able to produce a throughput of 1250MiB/s (i.e M4.16xl)) and your EBS Maximum Throughput is 500MiB/s (i.e. ST1), not only you will hit a bottleneck trying to write to the specific volumes, but also throttling might occur (i.e. EBS on cloud services).

How do I find what is the Maximum throughput for EC2 instances and EBS volumes?

Here is documentation about Maximum Instance Throughput for every instance type on EC2: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-optimized.html

And here about the EBS Maximum Volume throughput: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-volume-types.html

How do I solve the problem ?

If we have an instance/server that has more throughput capabilities than the volume, just add or split the storage capacity into more volumes. So the load/throughput will be distributed across the volumes.

As an example, here are some metrics with different volume configurations:

1 x 3000GB – 9000IOPS volume:

3 x 1000GB – 3000IOPS volume:

Look at some of the metrics: these are using the same instance type (m4.10xl – 500Mb/s throughput), same volume type (GP2 – 160Mb/s throughput) and running the same job:

Using 1 volume, Write/Read Latency is around 20-25 ms/op. This value is high compared to 3x1000GB volumes.
Using 1 volume, Avg Queue length 25. The queue depth is the number of pending I/O requests from your application to your volume. For maximum consistency, a Provisioned IOPS volume must maintain an average queue depth (rounded to the nearest whole number) of one for every 500 provisioned IOPS in a minute. On this scenario 9000/500=18. Queue length of 18 or higher will be needed to reach 9000 IOPS.
Burst Balance is 100%, which is Ok, but if this balance drops to zero (it will happen if volume capacity keeps being exceeded), all the requests will be throttled and you’ll start seeing IO errors.
On both scenarios, Avg Write Size is pretty large (around 125KiB/op) which will typically cause the volume to hit the throughput limit before hitting the IOPS limit.
Using 1 volume, Write throughput is around 1200 Ops/Sec. Having write size around 125Kb, it will consume about 150Mb/sec. (IOPS x BS = Throughput)

log4j vulnerability – quick notes

Posted on December 13, 2021 by hvivani

This issue may lead to remote code execution (RCE) via use of JNDI.

Vendor: Apache Software Foundation
- Product: Apache Log4j
  - <=2.14.1: affects 2.14.1 and prior versions

Fix: log4j 2.15.0

To list all JAR files on your system containing the vulnerable class file (including fat JARs), you can use:

for f in $(find . -name '*.jar' 2>/dev/null); do echo "Checking $f…"; unzip -l "$f" | grep -F org/apache/logging/log4j/core/lookup/JndiLookup.class; done

Additional details here:

https://www.cve.org/CVERecord?id=CVE-2021-44228

Listen on a port/Send data to a port

Posted on January 26, 2020 by hvivani

Listening with nc:

nc -l 8089

Checking that 8089 port is listening:

netstat -nap|grep 8089
(Not all processes could be identified, non-owned process info
will not be shown, you would have to be root to see it all.)
tcp 0 0 0.0.0.0:8089 0.0.0.0:* LISTEN 27543/nc
tcp 0 0 127.0.0.1:8089 127.0.0.1:43592 TIME_WAIT -

Sending data to 8089 port:

telnet localhost 8089
Trying 127.0.0.1...
Connected to localhost.localdomain (127.0.0.1).
Escape character is '^]'.
hello

listeningOnAPort_SendingDataToAPort

How to: Move a Linux running process to a screen shell session

Posted on July 17, 2019 by hvivani

Use case:

You just started a process (i.e. compile, copy, etc).
You noticed it will take much longer than expected to finish.
You cannot abort or risk the process to get aborted due to the current shell session finishing.
It would be ideal to have this process on ‘screen’ to have it running on backgroud.

We can move it to a screen session with the following steps:

Suspend the process
1. ```
press Ctrl+Z
```
Resume the process in the background
1. ```
bg
```
Disown the process
1. ```
disown %1
```
Launch a screen session
1. ```
screen
```
Find the PID of the process
1. ```
pgrep myappname
```
Use reptyr to take over the process
1. ```
reptyr 1234
```

Note: at the moment of writing this, reptyr is not available on any Fedore/Redhat repo. We’ll need to compile:

$ git clone https://github.com/nelhage/reptyr.git

$ cd reptyr/

$ make

$ sudo make install

Update protobuf to 2.5 on Centos 6

Posted on February 27, 2019 by hvivani

Update protobuf to 2.5 on Centos 6:

$ cd /usr/local/src/
$ wget https://github.com/google/protobuf/releases/download/v2.5.0/protobuf-2.5.0.tar.gz
$ tar xvf protobuf-2.5.0.tar.gz
$ cd protobuf-2.5.0
$ ./autogen.sh
$ ./configure --prefix=/usr
$ make
$ make install
$ protoc --version

Install protobuf for java

$ cd java
$ mvn install
$ mvn package

You should be good to go.

To enable you to install different versions of protobuf, install stow then change ./configure --prefix=/usr to ./configure --prefix=/usr/local/stow/protobuf-2.5.0

Then link protobuf into your system with stow:

$ cd /usr/local/stow
$ stow protobuf-2.5.0

Note: stow uses /usr/local/bin by default. Make sure thats in your $PATH

To unlink that version of protobuf,

$ stow -D protobuf-2.5.0

upgrading glibc from version 2.12 to 2.14 on CentOS

Posted on February 27, 2019 by hvivani

You cannot update glibc on Centos 6 safely, but, you can install 2.14 alongside 2.12 easily, then use it to compile projects etc.

mkdir ~/glibc_install; cd ~/glibc_install
wget http://ftp.gnu.org/gnu/glibc/glibc-2.14.tar.gz
tar zxvf glibc-2.14.tar.gz
cd glibc-2.14
mkdir build
cd build
../configure –prefix=/opt/glibc-2.14
make -j4
sudo make install
export LD_LIBRARY_PATH=/opt/glibc-2.14/lib

Adding a mount point to HDFS

Posted on October 30, 2017 by hvivani

Before proceeding:

This procedure considers that you don’t have any current useful data on HDFS. All the data will be lost after adding mount points with this method.

This procedure should be applied to every datanode in the cluster. No intervention in the master node is needed if the framework is configured properly.

#checking available block devices:
[ec2-user@ip-10-0-15-76 media]$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
nvme2n1 259:4 0 2.5T 0 disk
nvme1n1 259:3 0 2.5T 0 disk /media/ebs0
nvme4n1 259:6 0 2.5T 0 disk
nvme0n1 259:0 0 2G 0 disk
├─nvme0n1p1 259:1 0 2G 0 part /
└─nvme0n1p128 259:2 0 1M 0 part
nvme3n1 259:5 0 2.5T 0 disk

#checking formatted filesystem:
[ec2-user@ip-10-0-15-76 media]$ sudo file -s /dev/nvme2n1
/dev/nvme2n1: data

(this filesystem is not formatted)

#formatting to ext4:
[ec2-user@ip-10-0-15-76 media]$ sudo mkfs -t ext4 /dev/nvme2n1
mke2fs 1.42.12 (29-Aug-2014)
Creating filesystem with 655360000 4k blocks and 163840000 inodes
Filesystem UUID: 6d9c997f-d47b-4529-85c8-e56e8ef47a1d
Superblock backups stored on blocks:
32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968,
102400000, 214990848, 512000000, 550731776, 644972544

Allocating group tables: done
Writing inode tables: done
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done

#mounting
[ec2-user@ip-10-0-15-76 media]$ sudo mkdir /media/ebs1
[ec2-user@ip-10-0-15-76 media]$ sudo mount /dev/nvme2n1 /media/ebs1
[ec2-user@ip-10-0-15-76 media]$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
nvme2n1 259:4 0 2.5T 0 disk /media/ebs1
nvme1n1 259:3 0 2.5T 0 disk /media/ebs0
nvme4n1 259:6 0 2.5T 0 disk
nvme0n1 259:0 0 2G 0 disk
├─nvme0n1p1 259:1 0 2G 0 part /
└─nvme0n1p128 259:2 0 1M 0 part
nvme3n1 259:5 0 2.5T 0 disk

#final mount result
[ec2-user@ip-10-0-60-46 ~]$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
nvme2n1 259:4 0 2.5T 0 disk /media/ebs1
nvme1n1 259:3 0 2.5T 0 disk /media/ebs0
nvme4n1 259:6 0 2.5T 0 disk /media/ebs3
nvme0n1 259:0 0 2G 0 disk
├─nvme0n1p1 259:1 0 2G 0 part /
└─nvme0n1p128 259:2 0 1M 0 part
nvme3n1 259:5 0 2.5T 0 disk /media/ebs2

#checking mount points in hdfs-site.xml
[ec2-user@ip-10-0-60-46 media]$ cat /opt/hadoop-2.7.3/etc/hadoop/hdfs-site.xml |grep -A1 dfs.datanode.data.dir
<name>dfs.datanode.data.dir</name>
<value>/media/ebs0/hadoop/datanodes,/media/ebs1/hadoop/datanodes,/media/ebs2/hadoop/datanodes,/media/ebs3/hadoop/datanodes</value>

# create defined directory structure on mount point (for each mount point):
sudo mkdir -p /media/ebs1/hadoop/datanodes

# modify owner to the user that will start DFS (for each mount point):
sudo chown -R ec2-user:ec2-user /media/ebs1/hadoop/datanodes

#format namenode:
hadoop namenode -format

# stop/start DFS:
/opt/hadoop-2.7.3/sbin/stop-dfs.sh
/opt/hadoop-2.7.3/sbin/start-dfs.sh

# check service start status
tail -f /var/log/hadoop/hadoop-ec2-user-datanode-ip-10-0-15-76.log

**some ENV variables I usually use on these environments:

export HADOOP_SSH_OPTS="-i /home/ec2-user/.ssh/mykey -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null"
export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.151.x86_64/jre

AWS EMR – Big Data in Strata New York

Posted on September 23, 2017 by hvivani

Will you be in New York next week (Sept 25th – Sept 28th)?

Come meet the AWS Big Data team at Strata Data Conference, where we’ll be happy to answer your questions, hear about your requirements, and help you with your big data initiatives.

See you there!

Building and Deploying Apache Bigtop Applications

Posted on August 9, 2016 by hvivani

If you want to explore how to build an application for Apache Bigtop and then deploy it using EMR, have a look at this blog post I wrote for Amazon AWS:

http://blogs.aws.amazon.com/bigdata/post/TxNJ6YS4X6S59U/Building-and-Deploying-Custom-Applications-with-Apache-Bigtop-and-Amazon-EMR

Hernan Vivani's Blog

Linux, Big Data, AWS, Astronomy, Running, Cycling… and more

Tag Archives: fedora