Page 74 - Building Big Data Applications
P. 74

68 Building Big Data Applications


             the Phi and compared to a threshold, which will be used by the gossiper to determine the
             state of the node. The implementation is accomplished by the “failuredetector” class,
             which has three methods:
               isAlive(node_address)dWhat the detector will report about a given node’s
                aliveness.
               interpret(node_address)dthis method is used by the gossiper to make a decision
                on the health of the node, based on the suspicion level reached by calculating Phi
                (the accrued value of the state of responsiveness)
               report(node_address)dWhen a node receives a heartbeat, it invokes this method.

                With the Peer to Peer and gossip protocols implementation, we can see how the
             Cassandra architecture keeps the nodes synced and the operations on the nodes scalable
             and reliable. This model is derived and enhanced from Amazon’s Dynamo paper. Based
             on the discussion of Cassandra so far, we can see how the integration of two architec-
             tures from Bigtable and Dynamo has created a row-oriented column-store, that can scale
             and sustain performance. At this time of writing Cassandra is a top level project in
             Apache. Facebook has already moved on to proprietary techniques for large-scale data
             management, but there are several large and well-known companies that have adopted
             and implemented Cassandra for their architectural needs of large data management
             especially on the web, with continuous customer or user interactions.
                There are a lot more details on implementing Cassandra and performance tuning,
             which will be covered in the latter half of this book when we discuss the implementation
             and integration architectures.

             Basho Riak

             Riak is a document oriented database. It is similar in architecture to Cassandra, and the
             default is setup as a four-node cluster. It follows the same ring topology and gossip
             protocols in the underpinning architecture. Each of the four nodes contains eight nodes
             or eight rings, thus providing a 32 ring partition for use. A process called vnodes(virtual
             nodes) manages the partitions across the 4 node cluster. Riak uses a language called
             erlang and MapReduce. Another interesting feature of Riak is concept of links and link
             walking. Links enable you to create metadata to connect objects. Once you create links,
             you can traverse the objects and this is the process of link walking. The flexibility of links
             allows you to determine dynamically how to connect multiple objects. More information
             on Riak is available at Basho’s (the company that designed and developed Riak)website.
                Other popular NoSQL implementations are document databases (CouchBase,
             MongoDB, and other) and Graph Databases (Neo4j). Let us understand the premise
             behind the document database and graph database architectures.
                Document oriented databases or document database can be defined as a schema less
             and flexible model of storing data as documents, rather than relational structures. The
   69   70   71   72   73   74   75   76   77   78   79