Page 50 - Building Big Data Applications
P. 50
44 Building Big Data Applications
Pig Latin can be executed as statements in either In Local or MapReduce mode either
interactively or as batch programs.
In local mode, Pig runs in a single JVM and accesses the local filesystem. This
mode is suitable only for small datasets and can be run on minimal infrastructure.
In MapReduce mode Pig translates programs (queries and statements) into
MapReduce jobs and runs them on a Hadoop cluster. Production environments for
running Pig are deployed in this mode
Pig data types
Pig language supports the following data types:
Scalar types: int, long, double, chararray, bytearray
Complex types:
map: associative array
tuple: ordered list of data, elements may be of any scalar or complex type
bag: unordered collection of tuples
Running Pig programs
Pig programs can be run in three modes, all of which work in both local and
MapReduce mode (for more details see Apache Pig Wiki Page):
Scripting DrivendA Pig program can be run as a script file, processed from command
line
Grunt ShelldAn interactive shell for running Pig commands
EmbeddeddYou can run Pig programs from Java, using JDBC drivers like a tradi-
tional SQL programs from Java.
Pig program flow
Pig program control has many built-in commands and syntax. We will take a look at the
core execution model. Every Pig module has the LOAD, DUMP, and STORE statement.
A LOAD statement reads data from the filesystem.
A series of “transformation” statements process the data
An STORE statement writes output to the filesystem
A DUMP statement displays output to the screen
Common Pig command
LOADdRead data from filesystem
STOREdWrite data to filesystem