Page 62 - Building Big Data Applications
P. 62
56 Building Big Data Applications
Import syntaxdsqoop import –connect jdbc:mysql://localhost/testdb \–table
PERSON –username test –password ****.
This command will generate a series of tasks
Generate SQL code
Execute SQL code
Generate maps/reduces jobs
Execute MapReduce jobs
Transfer data to local files or HDFS
Export syntaxdsqoop export –connect jdbc:mysql://localhost/testdb \ –table
CLIENTS_INTG –username test –password **** \ –export-dir/user/localadmin/CLIENTS
This command will generate a series of tasks
Generate MapReduce jobs
Execute MapReduce jobs
Transfer data from local files or HDFS
Compile SQL code
Create or insert into CLIENTS_INTO table
There are many features of Sqoop1 that are easy to learn and implement, on the
command line you can specify if the import is directly to Hive, HDFS, or HBASE. There
are direct connectors to the most popular databases Oracle, SQL Server, MySQL,
Teradata, and PostgreSQL.
There are evolving challenges with Sqoop1 including the following:
Cryptic command line arguments
Nonsecure connectivitydsecurity risk
No metadata repositorydlimited reuse
Program driven installation and management
Sqoop2 is the next generation of data transfer architecture that is designed to solve
the limitations of Sqoop1 namely (Fig. 2.18).
Sqoop2 has a web-enabled UI
Sqooptwo will be driven by a Sqoop Server architecture
Sqoop2 will provide greater connector flexibility, apart from JDBC many native
connectivity options can be customized by providers
Sqoop2 will have a REST API interface
Sqoop2 will have its own metadata store
Sqoop2 will add credentials management capabilities, this will provide trusted
connection capabilities
The proposed architecture of Sqoop2 is shown in Fig. 2.19. For more information on
Sqoop status and issues please see the Apache Foundation website.