Page 43 - Building Big Data Applications
P. 43

Chapter 2   Infrastructure and technology  37


                 YARN scalability

                 The resource model for YARN is memory driven. Every node in the system is modeled to
                 be consisting of multiple containers of minimum size of memory. The ApplicationMaster
                 can request multiple of the minimum memory size as needed.
                   What this means to any application is the memory slots required to run a job can be
                 accessed from any node, depending on the availability of memory. This provides simple
                 chunkable scalability especially in a cluster configuration. In classic Hadoop MapReduce
                 the cluster is not artificially segregated into map and reduce slots and the application
                 jobs are bottlenecked on reduce slots limiting scalability in job execution in the dataflow
                 (Fig. 2.10).


                 YARN execution flow

                 Comparison between MapReduce v1 and v2

                 Presented here is a simple comparison between the two releases of MapReduce
                   classic MapReduce                 YARN
                  -  Job request submitted to JobTracker  -  Application executed by YARN
                  -  Jobtracker manages the execution with  -  Resources negotiated and allocated prior to job execution
                     tasks                           -  Map based resource request setup for the entire job
                  -  Resources are allocated on availability basis,  -  Resource monitor tracks usage and requests additional
                     some jobs get more and others less  resource as needed from across a cluster in a clustered setup
                  -  Resource allocation across a cluster  -  Job completion and cleanup tasks are executed
                  -  Multiple single points of failure



                 SQL/MapReduce

                 Business intelligence has been one of the most successful applications in the last decade,
                 but severe performance limitations have been a bottleneck especially with detailed data
                 analysis. The problem becomes compounded with analytics and the need for 360 de-
                 grees perspective on customer and product with ad-hoc analysis demands from users.
                 The powerful combination of SQL when extended to MapReduce will enable users to
                 explore larger volumes of raw data through normal SQL functions and regular BI tools.
                 This is the fundamental concept behind SQL/MapReduce. There are a few popular
                 implementations of SQL/MapReduce including Hive, AsterData, Greenplum, and
                 HadoopDB.
                   Fig. 2.5 shows a conceptual architecture of an SQL/MapReduce implementation.
                 There are a few important components to understand:
                   Translatordthis is a custom layer provided by the solution. It can simply be a li-
                   brary of functions to extend in the current database environment
   38   39   40   41   42   43   44   45   46   47   48