Page 66 - Building Big Data Applications
P. 66

60 Building Big Data Applications


             compliance. Dynamo and Memcached inspired the database architecture. Data is stored
             as a key with values in conjunction as a pair. Data is organized in a ring topology with
             redundancy and range management built into each node of the ring. The architecture is
             very niche in solving problems and hence did not get wide adoption outside of LinkedIn.
             It is still being evolved and updated at this time of writing.

             Cassandra

             Facebook in the initial years had used a leading commercial database solution for their
             internal architecture in conjunction with some Hadoop. Eventually the tsunami of users
             led the company to start thinking in terms of unlimited scalability and focus on avail-
             ability and distribution. The nature of the data and its producers and consumers did not
             mandate consistency but needed unlimited availability and scalable performance. The
             team at Facebook built an architecture that combines the data model approaches of
             Bigtable and the infrastructure approaches of Dynamo with scalability and performance
             capabilities named Cassandra. Often referred as hybrid architecture as it combines the
             column-oriented data model from Bigtable with Hadoop MapReduce jobs and it im-
             plements the patterns from dynamo like eventually consistent, gossip protocols, a
             masteremaster way of serving both read and write requests. Cassandra supports a full
             replication model based on NoSQLarchitectures.
                Cassandra team had a few design goals to meet, considering the architecture at the
             time of first development and deployment was primarily being deployed at Facebook.
             The goals included the following:

               High availability
               Eventual consistency
               Incremental scalability
               Optimistic replication
               Tunable tradeoffs between consistency, durability, and latency
               Low cost of ownership
               Minimal administration


             Data model
             Cassandra datamodel is based on a keyevalue model, where we have a key that uniquely
             identifies a value and this value can be structured or completely unstructured or can also
             be a collection of other keyevalue elements. This is very similar to pointers and linked
             lists in the world of programming. Fig. 2.22 shows the basic keyevalue structure.







                                           FIGURE 2.22 Keyevalue pair.
   61   62   63   64   65   66   67   68   69   70   71