Page 71 - Building Big Data Applications
P. 71

Chapter 2   Infrastructure and technology  65


                 Read consistency

                 Read consistency level specifies how many replicas must respond before a result is
                 returned to the client application. When a read request is made, Cassandra checks the
                 specified number of replicas for the most recent data based on the timestamp data, to
                 satisfy the read request.
                   NotedLocaland Each Quorum are defined in large multi data center configurations.

                 Specifying client consistency levels
                 Consistency level is specified by the client application when a read or write request is
                 made. For example,
                   SELECT    *FROM   CUSTOMERS      WHERE    STATE ¼ ‘IL’  USING   CONSISTENCY
                 QUORUM;

                 Built-in consistency repair features

                 Cassandra has a number of built-in repair features to ensure that data remains consis-
                 tent across replicas:

                   Read repairdis a technique that ensures that all nodes in a cluster are synchro-
                   nized with the latest version of data. When Cassandra detects that several nodes in
                   the cluster are out of sync, it marks the nodes with a Read Repair flag. This triggers
                   a process of synchronizing the stale nodes with newest version of the data reques-
                   ted. The check for inconsistent data is implemented by comparing the clock value
                   of the data and the clock value of the newest data. Any node with a clock value
                   that is older than the newest data is effectively flagged as stale.
                   Antientropynode repairdis a process that is run as a part of maintenance and
                   called as Nodetool process. This is a synchronized operation across the entire clus-
                   ter where the nodes are updated to be consistent. It is not an automatic process
                   and needs manual intervention. During this process, the node exchange has infor-
                   mation represented as “Merkletrees,” and if the tree information is not consistent,
                   a reconciliation exercise needs to be carried out. This feature comes from Amazon
                   Dynamo, with the difference being; in Cassandra each column family maintains its
                   own Merkle Tree.

                   A quick note, a Merkle tree is a hash key hierarchy verification and authentication
                 technique. When replicas are down for extended periods, the Merkle tree keeps checking
                 small portions of the replicas till the sync is broken, enabling a quick recovery (or infor-
                 mation on Merkle Trees, check Ralph Merkle’s webpage www.merkle.com)
                   Hinted handoffdDuring a write operation, data is set to all replicas by default. If a
                   node is down at that time, data is stored as a hint to be repaired when the node
                   comes back. If all nodes are down in a replica, the hint and the data are stored in
                   the coordinator. This process is called as a hinted handoff. No operation is
                   permitted in the node until all nodes are restored and synchronized.
   66   67   68   69   70   71   72   73   74   75   76