Page 65 - Building Big Data Applications
P. 65
Chapter 2 Infrastructure and technology 59
C A
P
FIGURE 2.21 CAP theorem.
In simple terms CAP theorem states that in a distributed data system, you can
guarantee two of the three requirements consistency (all data available at all nodes or
systems), availability (every request will get a response) and partition tolerance (system
will operate irrespective of availability or a partition or loss of data or communication).
The system architected on this model will be called BASE (basically available soft state
eventually consistent) architecture as opposed to ACID.
Combining the principles of the CAP theorem and the data architecture of Bigtable or
Dynamo there are several solutions that have evolveddHBase, MongoDB, Riak,
Voldemort, Neo4J, Cassandra, Hypertable, HyperGraphDB, Memcached, Tokyo Cabinet,
Redis, CouchDB, and more niche solutions. Of these the most popular and widely
distributed are the following:
HBASE, Hypertable, Bigtabledarchitected on CP (from CAP)
Cassandra, Dynamo, Voldemortdarchitected on AP (from CAP)
Broadly NoSQL databases have been classified into four subcategories.
Keyevalues pairdThis model is implemented using a hash table where there is a
unique key and a pointer to a particular item of data creating a keyevalue pair.
ExampledVoldemort andRiak
Column family storesdAn extension of the keyevalue architecture with columns
and column families, the overall goal was to process distributed data over a pool of
infrastructure. ExampledHBase and Cassandra.
Document databasesdthis class of databases is modeled after Lotus Notes and
similar to keyevalue stores. The data is stored as a document and is represented in JSON
or XML formats. The biggest design feature is the flexibility to list multiple levels of
keyevalue pairs. ExampledCouchDB.
Graph databasesdBased on the graph theory, this class of database supports the
scalability across a cluster of machines. The complexity of representation for extremely
complex sets of documents is evolving. ExampledNeo4J.
Let us focus on the different classes of NoSQL databases and understand their
technology approaches. We have already discussed HBASE as part of Hadoopsections in
this chapter.
Keyevalue pairdVoldemort
Voldemort is a project that originated in LinkedIn. The underlying need at LinkedIn was
a highly scalable lightweight database that can work without the rigidness of ACID