Page 68 - Building Big Data Applications
P. 68

62 Building Big Data Applications


                If we wanted to further group column families together to create or manage the
             relationship between the column families, the Cassandra model provides a super col-
             umn family.
                Super Column Familydis a logical and physical grouping of column families that
             can be represented by a single key. The flexibility of this model is you can represent
             relationships, hierarchies, and tree-like traversal in a simple and flexible manner.
                In order to create a meaningful data structure or architecture, a column family or
             super column family or multiples of the same need to be grouped in one set or under a
             common key. In Cassandra, a keyspace defines that set of column families grouped
             under one key. Typically we can decompose this as follows.
                Excel Document / Sheet 1/ columns/Formulas/sheet2(columns/formulas)/
             sheet 2(other columns/formulas) and so on. You can define a keyspace for an application,
             this is a preferred approach rather than create thousands of keyspaces for an application.


             A keyspace has configurable properties that are critical to
             understand

               Replication factordrefers to the number of nodes that can be copies or replicas for
                each row of data. If your replication factor is 2, then two nodes will have copies of
                each row. Data replication is transparent. Replication factor is the method of con-
                trolling consistency within Cassandra and is a tunable parameter in deciding per-
                formance and scalability balance.
               Replica placement strategydrefers to how the replicas will be placed in the deploy-
                ment (ringdwe will discuss this in the architecture section). There are two strate-
                gies provided to configure which node will get copies of which keys. These are
                SimpleStrategy (defined in the keyspace creation) and Network Topology Strategy
                (replications across datacenters).
               Column familiesdEach keyspace has at least one or more column families.
                Column family has configurable parameters described in Fig. 2.25

                             P  a r a  m    r e t e  D  a f e  V   t l u  u l a    e
                             column_type             Standard
                             compac on_strategy      SizeTieredCompac onStrategy
                             comparator              BytesType
                             compare_subcolumns_with    BytesType
                             dc_local_read_repair_chance    0
                             gc_grace_seconds        864000 (10 days)
                             keys_cached             200000
                             max_compac on_threshold    32
                             min_compac on_threshold    4
                             read_repair_chance      0.1 or 1 (See descrip on below.)
                             replicate_on_write      TRUE
                             rows_cached             0 (disabled by default)
                        FIGURE 2.25 Column family parameters (for details see Apache or Datastax website).
   63   64   65   66   67   68   69   70   71   72   73