Page 33 - Building Big Data Applications
P. 33

Chapter 2   Infrastructure and technology  27


























                                        FIGURE 2.5 Apache top level Hadoop projects.

                 HDFS

                 The biggest problem experienced by the early developers of large-scale data processing
                 was the ability to break down the files across multiple systems and process each piece of
                 the file independent of the other pieces, but yet consolidate the results together in a
                 single result set. The secondary problem that remained unsolved for was the fault
                 tolerance both at the file processing level and the overall system level in the distributed
                 processing systems.
                   With GFS the problem of scalingout processing across multiple systems was largely
                 solved. HDFS, which is derived from NDFS, was designed to solve the large distributed
                 data processing problem. Some of the fundamental design principles of HDFS are the
                 following:





















                                       FIGURE 2.6 Core Hadoop components (circa 2017).
   28   29   30   31   32   33   34   35   36   37   38