Page 241 -
P. 241

HAN
                               12-ch05-187-242-9780123814791

          204   Chapter 5 Data Cube Technology              2011/6/1  3:19 Page 204  #18



                         to each of the tuple’s descendant group-by’s. This is particularly useful since, in practice,
                         many partitions have a single tuple.

                           The BUC performance is sensitive to the order of the dimensions and to skew in the
                         data. Ideally, the most discriminating dimensions should be processed first. Dimensions
                         should be processed in the order of decreasing cardinality. The higher the cardinality,
                         the smaller the partitions, and thus the more partitions there will be, thereby providing
                         BUC with a greater opportunity for pruning. Similarly, the more uniform a dimension
                         (i.e., having less skew), the better it is for pruning.
                           BUC’s major contribution is the idea of sharing partitioning costs. However, unlike
                         MultiWay, it does not share the computation of aggregates between parent and child
                         group-by’s. For example, the computation of cuboid AB does not help that of ABC. The
                         latter needs to be computed essentially from scratch.


                   5.2.3 Star-Cubing: Computing Iceberg Cubes Using
                         a Dynamic Star-Tree Structure

                         In this section, we describe the Star-Cubing algorithm for computing iceberg cubes.
                         Star-Cubing combines the strengths of the other methods we have studied up to this
                         point. It integrates top-down and bottom-up cube computation and explores both
                         multidimensional aggregation (similar to MultiWay) and Apriori-like pruning (simi-
                         lar to BUC). It operates from a data structure called a star-tree, which performs lossless
                         data compression, thereby reducing the computation time and memory requirements.
                           The Star-Cubing algorithm explores both the bottom-up and top-down computa-
                         tion models as follows: On the global computation order, it uses the bottom-up model.
                         However, it has a sublayer underneath based on the top-down model, which explores the
                         notion of shared dimensions, as we shall see in the following. This integration allows the
                         algorithm to aggregate on multiple dimensions while still partitioning parent group-by’s
                         and pruning child group-by’s that do not satisfy the iceberg condition.
                           Star-Cubing’s approach is illustrated in Figure 5.8 for a 4-D data cube computation.
                         If we were to follow only the bottom-up model (similar to MultiWay), then the cuboids
                         marked as pruned by Star-Cubing would still be explored. Star-Cubing is able to prune
                         the indicated cuboids because it considers shared dimensions. ACD/A means cuboid
                         ACD has shared dimension A, ABD/AB means cuboid ABD has shared dimension AB,
                         ABC/ABC means cuboid ABC has shared dimension ABC, and so on. This comes from
                         the generalization that all the cuboids in the subtree rooted at ACD include dimension
                         A, all those rooted at ABD include dimensions AB, and all those rooted at ABC include
                         dimensions ABC (even though there is only one such cuboid). We call these common
                         dimensions the shared dimensions of those particular subtrees.
                           The introduction of shared dimensions facilitates shared computation. Because the
                         shared dimensions are identified early on in the tree expansion, we can avoid recom-
                         puting them later. For example, cuboid AB extending from ABD in Figure 5.8 would
                         actually be pruned because AB was already computed in ABD/AB. Similarly, cuboid
   236   237   238   239   240   241   242   243   244   245   246