Page 64 - Building Big Data Applications
P. 64
58 Building Big Data Applications
CAP Theorem
Amazon Dynamo Google Big Table
FIGURE 2.20 Tipping point in NoSQL.
data in a pseudo database environment but nor oriented completely toward SQL were
being discussed and the name NoSQL (Not only SQL) database was coined by Eric Evans
for the user group meeting to discuss the need for nonrelational and nonSQL driven
databases. This name has become the industry-adopted name for a class of databases,
which work on similar architectures, but purpose built for different workloads.
There were three significant papers that changed the NoSQL database from being a
niche solution to become an alternative platform (Fig. 2.20).
Google publishes the Bigtable architecture (http://labs.google.com/papers/bigtable.
html)
Eric Brewer discusses the CAP Theorem
Amazon publishes Dynamo (http://www.allthingsdistributed.com/2007/10/
amazons_dynamo.html)
Dynamo presented a highly available keyevalue store infrastructure and Bigtable
presented a data storage model based on multidimensional sorted map, where a three-
dimensional intersection between a rowkey, column key, and timestamp provide access
to any data in petabytes of data. Both these scalable architectures had concepts where a
distributed data processing system can be deployed at large scale to process different
pieces of workload, with replication for redundancy and computations being driven
programmatically. Both of these papers are the basis for further evolution of the archi-
tecture into multiple classes of databases. These architectures in conjunction with CAP
theorem will be a discussion in the later sections of this book, when we talk about ar-
chitecture of the future data warehouse and next generation analytics.
CAP theorem
Eric Brewer in the year 2000 presented a theory that he had been working for a few years at
UC Berkley and at his company Inktomi, at the “Symposium on Principles of Distributed
Computing”. He presented the concept that three core systemic requirements that need to
be considered when it comes to designing and deploying applications in a distributed
environment and further stated the relationship between these requirements will create
shear in terms of which requirement can you give up to accomplish the scalability re-
quirements of your situation. The three requirements are: consistency, availability, and
partition tolerance, giving Brewer’s Theorem its other namedCAP (Fig. 2.21).