Page 43 - Building Big Data Applications
P. 43
Chapter 2 Infrastructure and technology 37
YARN scalability
The resource model for YARN is memory driven. Every node in the system is modeled to
be consisting of multiple containers of minimum size of memory. The ApplicationMaster
can request multiple of the minimum memory size as needed.
What this means to any application is the memory slots required to run a job can be
accessed from any node, depending on the availability of memory. This provides simple
chunkable scalability especially in a cluster configuration. In classic Hadoop MapReduce
the cluster is not artificially segregated into map and reduce slots and the application
jobs are bottlenecked on reduce slots limiting scalability in job execution in the dataflow
(Fig. 2.10).
YARN execution flow
Comparison between MapReduce v1 and v2
Presented here is a simple comparison between the two releases of MapReduce
classic MapReduce YARN
- Job request submitted to JobTracker - Application executed by YARN
- Jobtracker manages the execution with - Resources negotiated and allocated prior to job execution
tasks - Map based resource request setup for the entire job
- Resources are allocated on availability basis, - Resource monitor tracks usage and requests additional
some jobs get more and others less resource as needed from across a cluster in a clustered setup
- Resource allocation across a cluster - Job completion and cleanup tasks are executed
- Multiple single points of failure
SQL/MapReduce
Business intelligence has been one of the most successful applications in the last decade,
but severe performance limitations have been a bottleneck especially with detailed data
analysis. The problem becomes compounded with analytics and the need for 360 de-
grees perspective on customer and product with ad-hoc analysis demands from users.
The powerful combination of SQL when extended to MapReduce will enable users to
explore larger volumes of raw data through normal SQL functions and regular BI tools.
This is the fundamental concept behind SQL/MapReduce. There are a few popular
implementations of SQL/MapReduce including Hive, AsterData, Greenplum, and
HadoopDB.
Fig. 2.5 shows a conceptual architecture of an SQL/MapReduce implementation.
There are a few important components to understand:
Translatordthis is a custom layer provided by the solution. It can simply be a li-
brary of functions to extend in the current database environment