Page 241 -
P. 241
HAN
12-ch05-187-242-9780123814791
204 Chapter 5 Data Cube Technology 2011/6/1 3:19 Page 204 #18
to each of the tuple’s descendant group-by’s. This is particularly useful since, in practice,
many partitions have a single tuple.
The BUC performance is sensitive to the order of the dimensions and to skew in the
data. Ideally, the most discriminating dimensions should be processed first. Dimensions
should be processed in the order of decreasing cardinality. The higher the cardinality,
the smaller the partitions, and thus the more partitions there will be, thereby providing
BUC with a greater opportunity for pruning. Similarly, the more uniform a dimension
(i.e., having less skew), the better it is for pruning.
BUC’s major contribution is the idea of sharing partitioning costs. However, unlike
MultiWay, it does not share the computation of aggregates between parent and child
group-by’s. For example, the computation of cuboid AB does not help that of ABC. The
latter needs to be computed essentially from scratch.
5.2.3 Star-Cubing: Computing Iceberg Cubes Using
a Dynamic Star-Tree Structure
In this section, we describe the Star-Cubing algorithm for computing iceberg cubes.
Star-Cubing combines the strengths of the other methods we have studied up to this
point. It integrates top-down and bottom-up cube computation and explores both
multidimensional aggregation (similar to MultiWay) and Apriori-like pruning (simi-
lar to BUC). It operates from a data structure called a star-tree, which performs lossless
data compression, thereby reducing the computation time and memory requirements.
The Star-Cubing algorithm explores both the bottom-up and top-down computa-
tion models as follows: On the global computation order, it uses the bottom-up model.
However, it has a sublayer underneath based on the top-down model, which explores the
notion of shared dimensions, as we shall see in the following. This integration allows the
algorithm to aggregate on multiple dimensions while still partitioning parent group-by’s
and pruning child group-by’s that do not satisfy the iceberg condition.
Star-Cubing’s approach is illustrated in Figure 5.8 for a 4-D data cube computation.
If we were to follow only the bottom-up model (similar to MultiWay), then the cuboids
marked as pruned by Star-Cubing would still be explored. Star-Cubing is able to prune
the indicated cuboids because it considers shared dimensions. ACD/A means cuboid
ACD has shared dimension A, ABD/AB means cuboid ABD has shared dimension AB,
ABC/ABC means cuboid ABC has shared dimension ABC, and so on. This comes from
the generalization that all the cuboids in the subtree rooted at ACD include dimension
A, all those rooted at ABD include dimensions AB, and all those rooted at ABC include
dimensions ABC (even though there is only one such cuboid). We call these common
dimensions the shared dimensions of those particular subtrees.
The introduction of shared dimensions facilitates shared computation. Because the
shared dimensions are identified early on in the tree expansion, we can avoid recom-
puting them later. For example, cuboid AB extending from ABD in Figure 5.8 would
actually be pruned because AB was already computed in ABD/AB. Similarly, cuboid