Page 25 - Building Big Data Applications

P. 25

Chapter 2 Infrastructure and technology 19

Minimal fault tolerance within infrastructure and expensive fault tolerance
solutions
Due to the inherent complexities and the economies of scale, the world of data
warehousing did not adopt to the concept of large-scale distributed data processing. On
the other hand the world of OLTP adopted and deployed distributed data processing
architecture, using heterogeneous and proprietary techniques, though this was largely
conﬁned to large enterprises, where latencies were not the primary concern. The most
popular implementation of this architecture is called as clienteserver data processing.
The clienteserver architecture had its own features and limitations, but it provided
limited scalability and ﬂexibility:
Beneﬁts
Centralization of administration, security, and setup
Back-up and recovery of data is inexpensive, as outages can occur at server or a
client and can be restored
Scalability of infrastructure by adding more server capacity or client capacity
can be accomplished. The scalability is not linear
Accessibility of server from heterogeneous platforms locally or remotely
Clients can use servers for different types of processing
Limitations
Server is the central point of failure
Very limited scalability
Performance can degrade with network congestion
Too many clients accessing a single server cannot process data in a quick time
In the late1980s and early 1990s there were several attempts at distributed data
processing in the OLTP world, with the emergence of “object oriented programming”
and “object store databases”. We learned that with effective programming and non-
relational data stores, we could effectively scale up distributed data processing across
multiple computers. It was at the same time the Internet was gaining adoption and web-
commerce or e-commerce was beginning to take shape. To serve Internet users faster
and better, several improvements rapidly emerged in the ﬁeld of networking with higher
speeds and bandwidth while lowering costs. At the same time the commoditization of
infrastructure platforms reduced the cost barrier of hardware.
The perfect storm was created with the biggest challenges that were faced by web
applications and search engines, which is unlimited scalability while maintaining sus-
tained performance at the lowest computing cost. Though this problem existed prior to
the advent of Internet, its intensity and complexity were not comparable to what web
applications brought about. Another signiﬁcant movement that was beginning to gain
notice was nonrelational databases (specialty databases) and NoSQL (not only SQL),
Combining the commoditization of infrastructure and distributed data processing
techniques including NoSQL, highly scalable and ﬂexible data processing architectures

20 21 22 23 24 25 26 27 28 29 30