Page 25 - Building Big Data Applications
P. 25

Chapter 2   Infrastructure and technology  19


                   Minimal fault tolerance within infrastructure and expensive fault tolerance
                   solutions
                   Due to the inherent complexities and the economies of scale, the world of data
                 warehousing did not adopt to the concept of large-scale distributed data processing. On
                 the other hand the world of OLTP adopted and deployed distributed data processing
                 architecture, using heterogeneous and proprietary techniques, though this was largely
                 confined to large enterprises, where latencies were not the primary concern. The most
                 popular implementation of this architecture is called as clienteserver data processing.
                   The clienteserver architecture had its own features and limitations, but it provided
                 limited scalability and flexibility:
                   Benefits
                     Centralization of administration, security, and setup
                     Back-up and recovery of data is inexpensive, as outages can occur at server or a
                      client and can be restored
                     Scalability of infrastructure by adding more server capacity or client capacity
                      can be accomplished. The scalability is not linear
                     Accessibility of server from heterogeneous platforms locally or remotely
                     Clients can use servers for different types of processing
                   Limitations
                     Server is the central point of failure
                     Very limited scalability
                     Performance can degrade with network congestion
                     Too many clients accessing a single server cannot process data in a quick time
                   In the late1980s and early 1990s there were several attempts at distributed data
                 processing in the OLTP world, with the emergence of “object oriented programming”
                 and “object store databases”. We learned that with effective programming and non-
                 relational data stores, we could effectively scale up distributed data processing across
                 multiple computers. It was at the same time the Internet was gaining adoption and web-
                 commerce or e-commerce was beginning to take shape. To serve Internet users faster
                 and better, several improvements rapidly emerged in the field of networking with higher
                 speeds and bandwidth while lowering costs. At the same time the commoditization of
                 infrastructure platforms reduced the cost barrier of hardware.
                   The perfect storm was created with the biggest challenges that were faced by web
                 applications and search engines, which is unlimited scalability while maintaining sus-
                 tained performance at the lowest computing cost. Though this problem existed prior to
                 the advent of Internet, its intensity and complexity were not comparable to what web
                 applications brought about. Another significant movement that was beginning to gain
                 notice was nonrelational databases (specialty databases) and NoSQL (not only SQL),
                 Combining the commoditization of infrastructure and distributed data processing
                 techniques including NoSQL, highly scalable and flexible data processing architectures
   20   21   22   23   24   25   26   27   28   29   30