Page 50 - Building Big Data Applications

P. 50

44 Building Big Data Applications

Pig Latin can be executed as statements in either In Local or MapReduce mode either
interactively or as batch programs.
In local mode, Pig runs in a single JVM and accesses the local ﬁlesystem. This
mode is suitable only for small datasets and can be run on minimal infrastructure.
In MapReduce mode Pig translates programs (queries and statements) into
MapReduce jobs and runs them on a Hadoop cluster. Production environments for
running Pig are deployed in this mode

Pig data types

Pig language supports the following data types:

Scalar types: int, long, double, chararray, bytearray
Complex types:
map: associative array
tuple: ordered list of data, elements may be of any scalar or complex type
bag: unordered collection of tuples

Running Pig programs

Pig programs can be run in three modes, all of which work in both local and
MapReduce mode (for more details see Apache Pig Wiki Page):
Scripting DrivendA Pig program can be run as a script ﬁle, processed from command
line
Grunt ShelldAn interactive shell for running Pig commands
EmbeddeddYou can run Pig programs from Java, using JDBC drivers like a tradi-
tional SQL programs from Java.

Pig program ﬂow

Pig program control has many built-in commands and syntax. We will take a look at the
core execution model. Every Pig module has the LOAD, DUMP, and STORE statement.
A LOAD statement reads data from the ﬁlesystem.
A series of “transformation” statements process the data
An STORE statement writes output to the ﬁlesystem
A DUMP statement displays output to the screen

Common Pig command

LOADdRead data from ﬁlesystem
STOREdWrite data to ﬁlesystem

45 46 47 48 49 50 51 52 53 54 55