Page 59 - Building Big Data Applications

P. 59

Chapter 2 Infrastructure and technology 53

- The ﬁnal results are stored in a temporary location and at the completion of
the entire query, the results are moved to the table if inserts or partitions, or
returned to the calling program at a temporary location
The comparison between how Hive executes versus a traditional RDBMS shows that
due to the schema on read design, the data placement, partitioning, joining, and storage
can be decided at the execution time rather than planning cycles.

Hive data types

Hive supports the following data typesdtinyint, int, smallint, bigint, ﬂoat, boolean,
string, and double. Special data types include Array, Map(keyevalue pair), and Struct
(collection of names ﬁelds).

Hive query language (HiveQL)
The Hive query language (HiveQL) is an evolving system that supports a lot of SQL
functionality on Hadoop, abstracting the MapReduce complexity to the end users.
Traditional SQL features like select, create table, insert, “from clause” subqueries,
various types of joinsdinner, left outer, right outer and outer joins, “group by”and ag-
gregations, union all, create table as select, and many useful functions.

Hive examples

Count Rows in a table e
SELECT COUNT(1) FROM table2;
SELECT COUNT(*) FROM table2;
Order By - colOrder: (ASC j DESC)
orderBy: ORDER BY colNamecolOrder?(‘,’ colNamecolOrder?)*
query: SELECT expression (‘,’ expression)* FROM srcorderBy

Chukwa

Chukwa is an open source data collection system for monitoring large distributed sys-
tems. Chukwa is built on top of the Hadoop distributed ﬁlesystem (HDFS) and
MapReduce framework. There is a ﬂexible and powerful toolkit for displaying, moni-
toring, and analyzing results to make the best use of the collected data available in
Chukwa.

Flume

Flume is a distributed, reliable, and available service for efﬁciently collecting, aggre-
gating, and moving large amounts of log data. It has a simple and ﬂexible architecture
based on streaming data ﬂows. It is robust and fault tolerant with tunable reliability

54 55 56 57 58 59 60 61 62 63 64