Page 26 - Building Big Data Applications
P. 26

20 Building Big Data Applications


             were designed and implemented for solving large-scale distributed processing by leading
             companies including Google, Yahoo, Facebook, and Amazon. The fundamental tenets
             that are common in this new architecture are the
               Extreme Parallel processingdability to process data in parallel within a system and
                across multiple systems at the same time
               Minimal database usagedRDBMS or DBMS will not be the central engine in the pro-
                cessing, removing any architecture limitations from the database ACID compliance
               Distributed File based storageddata is stored in files, which is cheaper compared
                to storing on a database. Additionally data is distributed across systems, providing
                built-in redundancy
               Linearly scalable infrastructuredevery piece of infrastructure added will create
                100% scalability from CPU to storage and memory
               Programmable APIsdall modules of data processing will be driven by procedural
                programming APIs, which allows for parallel processing without the limitations
                imposed by concurrency. The same data can be processed across systems for
                different purposes or the same logic can process across different systems. There
                are different case studies on these techniques.
               High-speed replicationddata is able to replicate at high speeds across the network
               Localized processing of data and storage of resultsdability to process and store re-
                sults locally, meaning compute and store occur in the same disk within the storage
                architecture. This means one needs to store replicated copies of data across disks
                to accomplish localized processing
               Fault tolerancedwith extreme replication and distributed processing, system fail-
                ures could be rebalanced with relative ease, as mandated by web users and appli-
                cations (Fig. 2.2).


























                              FIGURE 2.2 Generic new generation distributed data architecture.
   21   22   23   24   25   26   27   28   29   30   31