Page 71 - Building Big Data Applications
P. 71
Chapter 2 Infrastructure and technology 65
Read consistency
Read consistency level specifies how many replicas must respond before a result is
returned to the client application. When a read request is made, Cassandra checks the
specified number of replicas for the most recent data based on the timestamp data, to
satisfy the read request.
NotedLocaland Each Quorum are defined in large multi data center configurations.
Specifying client consistency levels
Consistency level is specified by the client application when a read or write request is
made. For example,
SELECT *FROM CUSTOMERS WHERE STATE ¼ ‘IL’ USING CONSISTENCY
QUORUM;
Built-in consistency repair features
Cassandra has a number of built-in repair features to ensure that data remains consis-
tent across replicas:
Read repairdis a technique that ensures that all nodes in a cluster are synchro-
nized with the latest version of data. When Cassandra detects that several nodes in
the cluster are out of sync, it marks the nodes with a Read Repair flag. This triggers
a process of synchronizing the stale nodes with newest version of the data reques-
ted. The check for inconsistent data is implemented by comparing the clock value
of the data and the clock value of the newest data. Any node with a clock value
that is older than the newest data is effectively flagged as stale.
Antientropynode repairdis a process that is run as a part of maintenance and
called as Nodetool process. This is a synchronized operation across the entire clus-
ter where the nodes are updated to be consistent. It is not an automatic process
and needs manual intervention. During this process, the node exchange has infor-
mation represented as “Merkletrees,” and if the tree information is not consistent,
a reconciliation exercise needs to be carried out. This feature comes from Amazon
Dynamo, with the difference being; in Cassandra each column family maintains its
own Merkle Tree.
A quick note, a Merkle tree is a hash key hierarchy verification and authentication
technique. When replicas are down for extended periods, the Merkle tree keeps checking
small portions of the replicas till the sync is broken, enabling a quick recovery (or infor-
mation on Merkle Trees, check Ralph Merkle’s webpage www.merkle.com)
Hinted handoffdDuring a write operation, data is set to all replicas by default. If a
node is down at that time, data is stored as a hint to be repaired when the node
comes back. If all nodes are down in a replica, the hint and the data are stored in
the coordinator. This process is called as a hinted handoff. No operation is
permitted in the node until all nodes are restored and synchronized.