Page 74 - Building Big Data Applications

P. 74

68 Building Big Data Applications

the Phi and compared to a threshold, which will be used by the gossiper to determine the
state of the node. The implementation is accomplished by the “failuredetector” class,
which has three methods:
isAlive(node_address)dWhat the detector will report about a given node’s
aliveness.
interpret(node_address)dthis method is used by the gossiper to make a decision
on the health of the node, based on the suspicion level reached by calculating Phi
(the accrued value of the state of responsiveness)
report(node_address)dWhen a node receives a heartbeat, it invokes this method.

With the Peer to Peer and gossip protocols implementation, we can see how the
Cassandra architecture keeps the nodes synced and the operations on the nodes scalable
and reliable. This model is derived and enhanced from Amazon’s Dynamo paper. Based
on the discussion of Cassandra so far, we can see how the integration of two architec-
tures from Bigtable and Dynamo has created a row-oriented column-store, that can scale
and sustain performance. At this time of writing Cassandra is a top level project in
Apache. Facebook has already moved on to proprietary techniques for large-scale data
management, but there are several large and well-known companies that have adopted
and implemented Cassandra for their architectural needs of large data management
especially on the web, with continuous customer or user interactions.
There are a lot more details on implementing Cassandra and performance tuning,
which will be covered in the latter half of this book when we discuss the implementation
and integration architectures.

Basho Riak

Riak is a document oriented database. It is similar in architecture to Cassandra, and the
default is setup as a four-node cluster. It follows the same ring topology and gossip
protocols in the underpinning architecture. Each of the four nodes contains eight nodes
or eight rings, thus providing a 32 ring partition for use. A process called vnodes(virtual
nodes) manages the partitions across the 4 node cluster. Riak uses a language called
erlang and MapReduce. Another interesting feature of Riak is concept of links and link
walking. Links enable you to create metadata to connect objects. Once you create links,
you can traverse the objects and this is the process of link walking. The ﬂexibility of links
allows you to determine dynamically how to connect multiple objects. More information
on Riak is available at Basho’s (the company that designed and developed Riak)website.
Other popular NoSQL implementations are document databases (CouchBase,
MongoDB, and other) and Graph Databases (Neo4j). Let us understand the premise
behind the document database and graph database architectures.
Document oriented databases or document database can be deﬁned as a schema less
and ﬂexible model of storing data as documents, rather than relational structures. The

69 70 71 72 73 74 75 76 77 78 79