Page 247 -

P. 247

HAN
12-ch05-187-242-9780123814791

210 Chapter 5 Data Cube Technology 2011/6/1 3:19 Page 210 #24

on b 1 just as it did on b∗. This traversal and the resultant trees are shown in Figure 5.12.
The child trees ACD/A and ABD/AB are created again but now with the new values
from the b 1 subtree. For example, notice that the aggregate count of c∗ in the ACD/A
tree has increased from 1 to 3. The trees that remained intact during the last traversal
are reused and the new aggregate values are added on. For instance, another branch is
added to the BCD tree.
Just like before, the algorithm will reach a leaf node at d∗ and traverse back. This
time, it will reach a 1 and notice that there exists a sibling in a 2 . In this case, all child
trees except BCD in Figure 5.12 are destroyed. Afterward, the algorithm will perform
the same traversal on a 2 . BCD continues to grow while the other subtrees start fresh
with a 2 instead of a 1 .

A node must satisfy two conditions in order to generate child trees: (1) the measure
of the node must satisfy the iceberg condition; and (2) the tree to be generated must
include at least one non-star-node (i.e., nontrivial). This is because if all the nodes were
star-nodes, then none of them would satisfy min sup. Therefore, it would be a complete
waste to compute them. This pruning is observed in Figures 5.11 and 5.12. For example,
the left subtree extending from node a 1 in the base tree in Figure 5.11 does not include
any nonstar-nodes. Therefore, the a 1 CD/a 1 subtree should not have been generated. It
is shown, however, for illustration of the child tree generation process.
Star-Cubing is sensitive to the ordering of dimensions, as with other iceberg cube
construction algorithms. For best performance, the dimensions are processed in order
of decreasing cardinality. This leads to a better chance of early pruning, because the
higher the cardinality, the smaller the partitions, and therefore the higher possibility
that the partition will be pruned.
Star-Cubing can also be used for full cube computation. When computing the full
cube for a dense data set, Star-Cubing’s performance is comparable with MultiWay and
is much faster than BUC. If the data set is sparse, Star-Cubing is signiﬁcantly faster
than MultiWay and faster than BUC, in most cases. For iceberg cube computation, Star-
Cubing is faster than BUC, where the data are skewed and the speed-up factor increases
as min sup decreases.

5.2.4 Precomputing Shell Fragments for Fast
High-Dimensional OLAP
Recall the reason that we are interested in precomputing data cubes: Data cubes facil-
itate fast OLAP in a multidimensional data space. However, a full data cube of high
dimensionality needs massive storage space and unrealistic computation time. Iceberg
cubes provide a more feasible alternative, as we have seen, wherein the iceberg con-
dition is used to specify the computation of only a subset of the full cube’s cells.
However, although an iceberg cube is smaller and requires less computation time than
its corresponding full cube, it is not an ultimate solution.
For one, the computation and storage of the iceberg cube can still be costly. For exam-
ple, if the base cuboid cell, (a 1 , a 2 ,..., a 60 ), passes minimum support (or the iceberg

242 243 244 245 246 247 248 249 250 251 252