Page 67 - Building Big Data Applications
P. 67

Chapter 2   Infrastructure and technology  61












                                        FIGURE 2.23 Cassandra keyevalue pair (column).


                   A keyevalue pair can represent a simple storage for Person/Name type of data but
                 cannot scale much. An alteration to the basic model is done to create a name and value in
                 the keyevalue pair, this would provide a structure to create multiple values and associate
                 a key to the nameevalue pair. This creates a table like structure described in Fig. 2.23.
                   In the updated structure of the keyevalue notation, we can store Person / Name /
                 John Doe, add another column called Person/Age/30 and create multiple storage
                 structures. This defines the most basic structure in the Cassandra data model called
                 “column”.
                   Columndis an ordered list of values stored as a nameevalue pair. It is composed of a
                 column name, a column value, and a third element called timestamp. The timestamp is
                 used to manage conflict resolution on the server, when there is a conflicting list of values
                 or columns to be managed. The client sets the timestamp that is stored along with the
                 data, and this is an explicit operation.
                   The column can hold any type of data in this model, varying from characters to GUID
                 to blobs. Columns can be grouped into a row called as rowkey. A simple column by itself
                 limits the values you can represent, to add more flexibility, a group of columns belonging
                 to a key can be stored together called as a column family. A column family can be loosely
                 compared to a table in the database comparison.
                   Column Familydis a logical and physical grouping of a set of columns, which can be
                 represented by a single key. The flexibility of column family is that the names of columns
                 can vary from a row to another and the number of columns can vary over a period of
                 time (Fig. 2.24).
                   There is no limitation with creating different column structures in a column family,
                 except the maintenance of the same is dependent on the application that is creating the
                 different structures. Conceptually it is similar to overloading in the object-oriented
                 programming language.



                                               Name         Name         Name       …

                              RowKey
                                                                                    …
                                               Value        Value        Value


                                         FIGURE 2.24 A column family representation.
   62   63   64   65   66   67   68   69   70   71   72