Page 61 - Building Big Data Applications
P. 61

Chapter 2   Infrastructure and technology  55





















                                             FIGURE 2.16 Sqoop1 architecture.


                   Use Hive and HDFS for data processing
                   Use Oozie for scheduling and managing jobs.

                   Installing Sqoopdcurrently you can download and install Sqoop from Apache
                 Foundation homepage or from any Hadoop distribution. The installation is manual and
                 needs configuration steps to be followed without any miss (Fig. 2.16).
                   Sqoop is completely driven by the client side installation and heavily depends on
                 JDBC technology as the first release of Sqoop was developed in Java. In this workflow
                 shown in Fig. 2.17, you can import and export the data from any database with simple
                 commands that you can execute from a command line interface (CLI), for example.






























                                  FIGURE 2.17 Hive process flow. Image sourcedHUG discussions.
   56   57   58   59   60   61   62   63   64   65   66