Page 34 - Building Big Data Applications
P. 34
28 Building Big Data Applications
Redundancydhardware will be prone to failure and processes can run out of infra-
structure resources. But redundancy built into the design can handle these
situations
Scalabilitydlinear scalability at a storage layer is needed to utilize parallel process-
ing at its optimum level. Designing for 100% linear scalability
Fault tolerancedautomatic ability to recover from failure
Cross platform compatibility
Compute and storage in one environmentddata and computation colocated in the
same architecture will remove a lot of redundant I/O and disk access
The three principle goals of HDFS are the following:
Process extremely large filesdranging from multiple gigabytes to petabytes
Streaming data processingdread data at high throughput rates and process data
on read
Capability to execute on commodity hardwaredno special hardware requirements
These capabilities and goals form the robust platform for data processing that exists
in the Hadoop platform today.
HDFS architecture
Fig. 2.7 shows the architecture of HDFS. The underlying architecture of HDFS represents
master/slave architecture. The main components of HDFS are the following
NameNode
DataNode
Image
Journal
NameNode
The NameNode is a single master server that manages the filesystem namespace and
regulates access to files by clients. Additionally the NameNode manages all the
FIGURE 2.7 HDFS architecture.