Installing sqoop


Apache Sqoop is a tool designed for efficiently transferring bulk data between Apache Hadoop and structured datastores such as relational databases.

To install sqoop follow these commands:

wget https://archive.apache.org/dist/sqoop/1.4.5/sqoop-1.4.5.bin__hadoop-2.0.4-alpha.tar.gz 

tar -xzf sqoop-1.4.5.bin__hadoop-2.0.4-alpha.tar.gz 

mv sqoop-1.4.5.bin__hadoop-2.0.4-alpha /home/hadoop/sqoop 

wget https://s3-eu-west-1.amazonaws.com/vivanih-emr/sqoop/mysql-connector-java-5.1.30.tar.gz 

tar -xzf mysql-connector-java-5.1.30.tar.gz 

cp mysql-connector-java-5.1.30/mysql-connector-java-5.1.30-bin.jar /home/hadoop/sqoop/lib/ 

export PATH=$PATH:/home/hadoop/sqoop/bin

A basic command to copy from MySQL to HDFS is:

sqoop import --connect jdbc:mysql://shard3copy.cbmfrobfdt6y.us-east-1.rds.amazonaws.com:3306/shard3copy --username root --password hadoop123 --table yourtable

Now, if you need to make it available to Hive (create the metastore entry), you can use:

sqoop import --connect jdbc:mysql://mysqlhost/mysqldb --username user --password passwd --query "SELECT table1.date_col, table1.big_int_col, table1.int_col, table1.varchar_col, table1.float_col FROM MYSQL_TABLE1 AS table1 WHERE \$CONDITIONS" --split-by table1.date_col --hive-import --hive-table hivedb.hive_table1 --target-dir hive_table1_data`