Page 64 - Building Big Data Applications
P. 64

58 Building Big Data Applications



                                                CAP Theorem





                                       Amazon Dynamo                   Google Big Table
                                        FIGURE 2.20 Tipping point in NoSQL.
             data in a pseudo database environment but nor oriented completely toward SQL were
             being discussed and the name NoSQL (Not only SQL) database was coined by Eric Evans
             for the user group meeting to discuss the need for nonrelational and nonSQL driven
             databases. This name has become the industry-adopted name for a class of databases,
             which work on similar architectures, but purpose built for different workloads.
                There were three significant papers that changed the NoSQL database from being a
             niche solution to become an alternative platform (Fig. 2.20).

               Google publishes the Bigtable architecture (http://labs.google.com/papers/bigtable.
                html)
               Eric Brewer discusses the CAP Theorem
               Amazon publishes Dynamo (http://www.allthingsdistributed.com/2007/10/
                amazons_dynamo.html)

                Dynamo presented a highly available keyevalue store infrastructure and Bigtable
             presented a data storage model based on multidimensional sorted map, where a three-
             dimensional intersection between a rowkey, column key, and timestamp provide access
             to any data in petabytes of data. Both these scalable architectures had concepts where a
             distributed data processing system can be deployed at large scale to process different
             pieces of workload, with replication for redundancy and computations being driven
             programmatically. Both of these papers are the basis for further evolution of the archi-
             tecture into multiple classes of databases. These architectures in conjunction with CAP
             theorem will be a discussion in the later sections of this book, when we talk about ar-
             chitecture of the future data warehouse and next generation analytics.


             CAP theorem

             Eric Brewer in the year 2000 presented a theory that he had been working for a few years at
             UC Berkley and at his company Inktomi, at the “Symposium on Principles of Distributed
             Computing”. He presented the concept that three core systemic requirements that need to
             be considered when it comes to designing and deploying applications in a distributed
             environment and further stated the relationship between these requirements will create
             shear in terms of which requirement can you give up to accomplish the scalability re-
             quirements of your situation. The three requirements are: consistency, availability, and
             partition tolerance, giving Brewer’s Theorem its other namedCAP (Fig. 2.21).
   59   60   61   62   63   64   65   66   67   68   69