Page 34 - Building Big Data Applications

P. 34

28 Building Big Data Applications

Redundancydhardware will be prone to failure and processes can run out of infra-
structure resources. But redundancy built into the design can handle these
situations
Scalabilitydlinear scalability at a storage layer is needed to utilize parallel process-
ing at its optimum level. Designing for 100% linear scalability
Fault tolerancedautomatic ability to recover from failure
Cross platform compatibility
Compute and storage in one environmentddata and computation colocated in the
same architecture will remove a lot of redundant I/O and disk access
The three principle goals of HDFS are the following:
Process extremely large ﬁlesdranging from multiple gigabytes to petabytes
Streaming data processingdread data at high throughput rates and process data
on read
Capability to execute on commodity hardwaredno special hardware requirements

These capabilities and goals form the robust platform for data processing that exists
in the Hadoop platform today.

HDFS architecture
Fig. 2.7 shows the architecture of HDFS. The underlying architecture of HDFS represents
master/slave architecture. The main components of HDFS are the following
NameNode
DataNode
Image
Journal

NameNode

The NameNode is a single master server that manages the ﬁlesystem namespace and
regulates access to ﬁles by clients. Additionally the NameNode manages all the

FIGURE 2.7 HDFS architecture.

29 30 31 32 33 34 35 36 37 38 39