Page 40 - Building Big Data Applications
P. 40

34 Building Big Data Applications


             YARNdyet another resource negotiator

             The advancements of Hadoop were having an issue in 2011, the focus of the issue was
             highlighted by Eric Baldeschwieler the then CEO of Hortonworks when MapReduce
             distinctly showcased two big areas of weakness one being scalability and second the
             utilization of resources. The goal of the new framework which was titled Yet Another
             Resource Negotiator (YARN) was to introduce the operating system for Hadoop. An
             operating system in Hadoop ensures scalability, performance, and resource utilization
             which has resulted in an architecture for Internet of Things to be implemented. The most
             important concept of YARN is the ability to implement a data processing paradigm
             called as lazy evaluation and extremely late binding (we will discuss this in all the
             following chapters), and this feature is the future of data processing and management.
             The ideation of a data warehouse will be very much possible with an operating system
             model where we can go from raw and operational data to data lakes and data hubs.
                YARN addresses the key issues of Hadoop 1.0, and these include the following:

               The JobTracker is a major component in data processing as it manages key tasks of
                resource marshaling and job execution at individual task levels. This interface has
                deficiencies in
                  Memory consumption
                  Threading-model
                  Scalability
                  Reliability
                  Performance

                These issues have been addressed by individual situations and several tweaks in
             design are done to circumvent the shortcomings. The problem manifests in large clusters
             where it becomes difficult to manage the issue (Fig. 2.8).
               Overall issues have been observed in large clustered environments in the areas of
                  Reliability
                  Availability
                  ScalabilitydClusters of 10,000 nodes or/and 200,000 cores
                  EvolutiondAbility for customers to control upgrades to the grid software stack
                  Predictable LatencydA major customer concern
                  Cluster utilization
                  Support for alternate programming paradigms to MapReduce
                The two major functionalities of the JobTracker are resource management and job
             scheduling/monitoring. The load that is processed by JobTracker runs into problems due
             to competing demand for resources and execution cycles arising from the single point of
             control in the design. The fundamental idea of YARN is to split up the two major
             functionalities of the JobTracker into separate processes. In the new release architecture,
   35   36   37   38   39   40   41   42   43   44   45