Page 69 - Building Big Data Applications
P. 69

Chapter 2   Infrastructure and technology  63


                   As we have learned so far, a keyspace provides the data structure for Cassandra to
                 store the column families and the subgroups. To store the keyspace and the metadata
                 associated with it, Cassandra provides the architecture of a cluster, often referred as the
                 “ring”. Cassandra distributes data to the nodes by arranging them in a ring that forms the
                 cluster.

                 Data partitioning

                 Data partitioning can be done either by the client library or by any node of the cluster
                 and can be calculated using different algorithms; there are two native algorithms that are
                 provided with Cassandra:
                   The first algorithm is the RandomPartitionerda hash-based distribution, where the
                   keys are more equally partitioned across the different nodes, providing better load
                   balancing. In this partitioning each row and all the columns associated with the
                   rowkey are stored on the same physical node and columns are sorted based on
                   their name.
                   The second algorithm is the OrderPreservingPartitionerdcreates partitions based
                   on the key and data grouped by keys, which will boost performance of range
                   queries since the query will need to hit lesser number of nodes to get all the ranges
                   of data

                 Data sorting

                 When defining a column, you can specify how the columns will be sorted when results
                 are returned to the client. Columns are sorted by the “compare with” type defined on
                 their enclosing column family. You can specify a custom sort order, the default provided
                 options are as follows:
                   BytesTypedSimple sort by byte value. No validation is performed.
                   AsciiTypedSimilar to BytesType but validates that the input can be parsed as US-
                   ASCII.
                   UTF8TypedA string encoded as UTF8
                   LongTypedA 64-bit long
                   LexicalUUIDTypedA 128bitUUID, compared lexically (by byte value)
                   TimeUUIDType: A 128bit version 1UUID, compared by timestamp
                   IntegerdFaster than a log, supports fewer or longer lengths.

                 Consistency management

                 The architecture model for Cassandra is AP with eventual consistency. Cassandra’s
                 consistency is measured by how recent and concurrent are all replicas for one row of
                 data. Though the database is built on eventual consistency model, real world applica-
                 tions will mandate consistency for all read and write operations. In order to manage the
   64   65   66   67   68   69   70   71   72   73   74