Page 40 - Building Big Data Applications
P. 40
34 Building Big Data Applications
YARNdyet another resource negotiator
The advancements of Hadoop were having an issue in 2011, the focus of the issue was
highlighted by Eric Baldeschwieler the then CEO of Hortonworks when MapReduce
distinctly showcased two big areas of weakness one being scalability and second the
utilization of resources. The goal of the new framework which was titled Yet Another
Resource Negotiator (YARN) was to introduce the operating system for Hadoop. An
operating system in Hadoop ensures scalability, performance, and resource utilization
which has resulted in an architecture for Internet of Things to be implemented. The most
important concept of YARN is the ability to implement a data processing paradigm
called as lazy evaluation and extremely late binding (we will discuss this in all the
following chapters), and this feature is the future of data processing and management.
The ideation of a data warehouse will be very much possible with an operating system
model where we can go from raw and operational data to data lakes and data hubs.
YARN addresses the key issues of Hadoop 1.0, and these include the following:
The JobTracker is a major component in data processing as it manages key tasks of
resource marshaling and job execution at individual task levels. This interface has
deficiencies in
Memory consumption
Threading-model
Scalability
Reliability
Performance
These issues have been addressed by individual situations and several tweaks in
design are done to circumvent the shortcomings. The problem manifests in large clusters
where it becomes difficult to manage the issue (Fig. 2.8).
Overall issues have been observed in large clustered environments in the areas of
Reliability
Availability
ScalabilitydClusters of 10,000 nodes or/and 200,000 cores
EvolutiondAbility for customers to control upgrades to the grid software stack
Predictable LatencydA major customer concern
Cluster utilization
Support for alternate programming paradigms to MapReduce
The two major functionalities of the JobTracker are resource management and job
scheduling/monitoring. The load that is processed by JobTracker runs into problems due
to competing demand for resources and execution cycles arising from the single point of
control in the design. The fundamental idea of YARN is to split up the two major
functionalities of the JobTracker into separate processes. In the new release architecture,