Page 50 - Building Big Data Applications
P. 50

44 Building Big Data Applications


                Pig Latin can be executed as statements in either In Local or MapReduce mode either
             interactively or as batch programs.
               In local mode, Pig runs in a single JVM and accesses the local filesystem. This
                mode is suitable only for small datasets and can be run on minimal infrastructure.
               In MapReduce mode Pig translates programs (queries and statements) into
                MapReduce jobs and runs them on a Hadoop cluster. Production environments for
                running Pig are deployed in this mode

             Pig data types

             Pig language supports the following data types:

               Scalar types: int, long, double, chararray, bytearray
               Complex types:
               map: associative array
               tuple: ordered list of data, elements may be of any scalar or complex type
               bag: unordered collection of tuples


             Running Pig programs

             Pig programs can be run in three modes, all of which work in both local and
                MapReduce mode (for more details see Apache Pig Wiki Page):
                Scripting DrivendA Pig program can be run as a script file, processed from command
             line
                Grunt ShelldAn interactive shell for running Pig commands
                EmbeddeddYou can run Pig programs from Java, using JDBC drivers like a tradi-
             tional SQL programs from Java.


             Pig program flow

             Pig program control has many built-in commands and syntax. We will take a look at the
             core execution model. Every Pig module has the LOAD, DUMP, and STORE statement.
               A LOAD statement reads data from the filesystem.
               A series of “transformation” statements process the data
               An STORE statement writes output to the filesystem
               A DUMP statement displays output to the screen


             Common Pig command

                LOADdRead data from filesystem
                STOREdWrite data to filesystem
   45   46   47   48   49   50   51   52   53   54   55