Page 247 -
P. 247

HAN
                               12-ch05-187-242-9780123814791

          210   Chapter 5 Data Cube Technology              2011/6/1  3:19 Page 210  #24



                         on b 1 just as it did on b∗. This traversal and the resultant trees are shown in Figure 5.12.
                         The child trees ACD/A and ABD/AB are created again but now with the new values
                         from the b 1 subtree. For example, notice that the aggregate count of c∗ in the ACD/A
                         tree has increased from 1 to 3. The trees that remained intact during the last traversal
                         are reused and the new aggregate values are added on. For instance, another branch is
                         added to the BCD tree.
                           Just like before, the algorithm will reach a leaf node at d∗ and traverse back. This
                         time, it will reach a 1 and notice that there exists a sibling in a 2 . In this case, all child
                         trees except BCD in Figure 5.12 are destroyed. Afterward, the algorithm will perform
                         the same traversal on a 2 . BCD continues to grow while the other subtrees start fresh
                         with a 2 instead of a 1 .

                           A node must satisfy two conditions in order to generate child trees: (1) the measure
                         of the node must satisfy the iceberg condition; and (2) the tree to be generated must
                         include at least one non-star-node (i.e., nontrivial). This is because if all the nodes were
                         star-nodes, then none of them would satisfy min sup. Therefore, it would be a complete
                         waste to compute them. This pruning is observed in Figures 5.11 and 5.12. For example,
                         the left subtree extending from node a 1 in the base tree in Figure 5.11 does not include
                         any nonstar-nodes. Therefore, the a 1 CD/a 1 subtree should not have been generated. It
                         is shown, however, for illustration of the child tree generation process.
                           Star-Cubing is sensitive to the ordering of dimensions, as with other iceberg cube
                         construction algorithms. For best performance, the dimensions are processed in order
                         of decreasing cardinality. This leads to a better chance of early pruning, because the
                         higher the cardinality, the smaller the partitions, and therefore the higher possibility
                         that the partition will be pruned.
                           Star-Cubing can also be used for full cube computation. When computing the full
                         cube for a dense data set, Star-Cubing’s performance is comparable with MultiWay and
                         is much faster than BUC. If the data set is sparse, Star-Cubing is significantly faster
                         than MultiWay and faster than BUC, in most cases. For iceberg cube computation, Star-
                         Cubing is faster than BUC, where the data are skewed and the speed-up factor increases
                         as min sup decreases.

                   5.2.4 Precomputing Shell Fragments for Fast
                         High-Dimensional OLAP
                         Recall the reason that we are interested in precomputing data cubes: Data cubes facil-
                         itate fast OLAP in a multidimensional data space. However, a full data cube of high
                         dimensionality needs massive storage space and unrealistic computation time. Iceberg
                         cubes provide a more feasible alternative, as we have seen, wherein the iceberg con-
                         dition is used to specify the computation of only a subset of the full cube’s cells.
                         However, although an iceberg cube is smaller and requires less computation time than
                         its corresponding full cube, it is not an ultimate solution.
                           For one, the computation and storage of the iceberg cube can still be costly. For exam-
                         ple, if the base cuboid cell, (a 1 , a 2 ,..., a 60 ), passes minimum support (or the iceberg
   242   243   244   245   246   247   248   249   250   251   252