Page 43 - Building Big Data Applications

P. 43

Chapter 2 Infrastructure and technology 37

YARN scalability

The resource model for YARN is memory driven. Every node in the system is modeled to
be consisting of multiple containers of minimum size of memory. The ApplicationMaster
can request multiple of the minimum memory size as needed.
What this means to any application is the memory slots required to run a job can be
accessed from any node, depending on the availability of memory. This provides simple
chunkable scalability especially in a cluster conﬁguration. In classic Hadoop MapReduce
the cluster is not artiﬁcially segregated into map and reduce slots and the application
jobs are bottlenecked on reduce slots limiting scalability in job execution in the dataﬂow
(Fig. 2.10).

YARN execution ﬂow

Comparison between MapReduce v1 and v2

Presented here is a simple comparison between the two releases of MapReduce
classic MapReduce YARN
- Job request submitted to JobTracker - Application executed by YARN
- Jobtracker manages the execution with - Resources negotiated and allocated prior to job execution
tasks - Map based resource request setup for the entire job
- Resources are allocated on availability basis, - Resource monitor tracks usage and requests additional
some jobs get more and others less resource as needed from across a cluster in a clustered setup
- Resource allocation across a cluster - Job completion and cleanup tasks are executed
- Multiple single points of failure

SQL/MapReduce

Business intelligence has been one of the most successful applications in the last decade,
but severe performance limitations have been a bottleneck especially with detailed data
analysis. The problem becomes compounded with analytics and the need for 360 de-
grees perspective on customer and product with ad-hoc analysis demands from users.
The powerful combination of SQL when extended to MapReduce will enable users to
explore larger volumes of raw data through normal SQL functions and regular BI tools.
This is the fundamental concept behind SQL/MapReduce. There are a few popular
implementations of SQL/MapReduce including Hive, AsterData, Greenplum, and
HadoopDB.
Fig. 2.5 shows a conceptual architecture of an SQL/MapReduce implementation.
There are a few important components to understand:
Translatordthis is a custom layer provided by the solution. It can simply be a li-
brary of functions to extend in the current database environment

38 39 40 41 42 43 44 45 46 47 48