Page 59 - Building Big Data Applications
P. 59
Chapter 2 Infrastructure and technology 53
- The final results are stored in a temporary location and at the completion of
the entire query, the results are moved to the table if inserts or partitions, or
returned to the calling program at a temporary location
The comparison between how Hive executes versus a traditional RDBMS shows that
due to the schema on read design, the data placement, partitioning, joining, and storage
can be decided at the execution time rather than planning cycles.
Hive data types
Hive supports the following data typesdtinyint, int, smallint, bigint, float, boolean,
string, and double. Special data types include Array, Map(keyevalue pair), and Struct
(collection of names fields).
Hive query language (HiveQL)
The Hive query language (HiveQL) is an evolving system that supports a lot of SQL
functionality on Hadoop, abstracting the MapReduce complexity to the end users.
Traditional SQL features like select, create table, insert, “from clause” subqueries,
various types of joinsdinner, left outer, right outer and outer joins, “group by”and ag-
gregations, union all, create table as select, and many useful functions.
Hive examples
Count Rows in a table e
SELECT COUNT(1) FROM table2;
SELECT COUNT(*) FROM table2;
Order By - colOrder: (ASC j DESC)
orderBy: ORDER BY colNamecolOrder?(‘,’ colNamecolOrder?)*
query: SELECT expression (‘,’ expression)* FROM srcorderBy
Chukwa
Chukwa is an open source data collection system for monitoring large distributed sys-
tems. Chukwa is built on top of the Hadoop distributed filesystem (HDFS) and
MapReduce framework. There is a flexible and powerful toolkit for displaying, moni-
toring, and analyzing results to make the best use of the collected data available in
Chukwa.
Flume
Flume is a distributed, reliable, and available service for efficiently collecting, aggre-
gating, and moving large amounts of log data. It has a simple and flexible architecture
based on streaming data flows. It is robust and fault tolerant with tunable reliability