Page 26 - Building Big Data Applications
P. 26
20 Building Big Data Applications
were designed and implemented for solving large-scale distributed processing by leading
companies including Google, Yahoo, Facebook, and Amazon. The fundamental tenets
that are common in this new architecture are the
Extreme Parallel processingdability to process data in parallel within a system and
across multiple systems at the same time
Minimal database usagedRDBMS or DBMS will not be the central engine in the pro-
cessing, removing any architecture limitations from the database ACID compliance
Distributed File based storageddata is stored in files, which is cheaper compared
to storing on a database. Additionally data is distributed across systems, providing
built-in redundancy
Linearly scalable infrastructuredevery piece of infrastructure added will create
100% scalability from CPU to storage and memory
Programmable APIsdall modules of data processing will be driven by procedural
programming APIs, which allows for parallel processing without the limitations
imposed by concurrency. The same data can be processed across systems for
different purposes or the same logic can process across different systems. There
are different case studies on these techniques.
High-speed replicationddata is able to replicate at high speeds across the network
Localized processing of data and storage of resultsdability to process and store re-
sults locally, meaning compute and store occur in the same disk within the storage
architecture. This means one needs to store replicated copies of data across disks
to accomplish localized processing
Fault tolerancedwith extreme replication and distributed processing, system fail-
ures could be rebalanced with relative ease, as mandated by web users and appli-
cations (Fig. 2.2).
FIGURE 2.2 Generic new generation distributed data architecture.