Page 34 - Building Big Data Applications
P. 34

28 Building Big Data Applications


               Redundancydhardware will be prone to failure and processes can run out of infra-
                structure resources. But redundancy built into the design can handle these
                situations
               Scalabilitydlinear scalability at a storage layer is needed to utilize parallel process-
                ing at its optimum level. Designing for 100% linear scalability
               Fault tolerancedautomatic ability to recover from failure
               Cross platform compatibility
               Compute and storage in one environmentddata and computation colocated in the
                same architecture will remove a lot of redundant I/O and disk access
                The three principle goals of HDFS are the following:
               Process extremely large filesdranging from multiple gigabytes to petabytes
               Streaming data processingdread data at high throughput rates and process data
                on read
               Capability to execute on commodity hardwaredno special hardware requirements

                These capabilities and goals form the robust platform for data processing that exists
             in the Hadoop platform today.

             HDFS architecture
             Fig. 2.7 shows the architecture of HDFS. The underlying architecture of HDFS represents
             master/slave architecture. The main components of HDFS are the following
               NameNode
               DataNode
               Image
               Journal

             NameNode


             The NameNode is a single master server that manages the filesystem namespace and
             regulates access to files by clients. Additionally the NameNode manages all the
















                                           FIGURE 2.7 HDFS architecture.
   29   30   31   32   33   34   35   36   37   38   39