Page 272 -
P. 272
3:19 Page 235
2011/6/1
12-ch05-187-242-9780123814791
#49
HAN
5.6 Exercises 235
lattice. Iceberg cubes and shell fragments are examples of partial materialization. An
iceberg cube is a data cube that stores only those cube cells that have an aggregate
value (e.g., count) above some minimum support threshold. For shell fragments of
a data cube, only some cuboids involving a small number of dimensions are com-
puted, and queries on additional combinations of the dimensions can be computed
on-the-fly.
There are several efficient data cube computation methods. In this chapter, we dis-
cussed four cube computation methods in detail: (1) MultiWay array aggregation for
materializing full data cubes in sparse-array-based, bottom-up, shared computation;
(2) BUC for computing iceberg cubes by exploring ordering and sorting for efficient
top-down computation; (3) Star-Cubing for computing iceberg cubes by integrating
top-down and bottom-up computation using a star-tree structure; and (4) shell-
fragment cubing, which supports high-dimensional OLAP by precomputing only
the partitioned cube shell fragments.
Multidimensional data mining in cube space is the integration of knowledge discov-
ery with multidimensional data cubes. It facilitates systematic and focused knowledge
discovery in large structured and semi-structured data sets. It will continue to endow
analysts with tremendous flexibility and power at multidimensional and multigran-
ularity exploratory analysis. This is a vast open area for researchers to build powerful
and sophisticated data mining mechanisms.
Techniques for processing advanced queries have been proposed that take advantage
of cube technology. These include sampling cubes for multidimensional analysis on
sampling data, and ranking cubes for efficient top-k (ranking) query processing in
large relational data sets.
This chapter highlighted three approaches to multidimensional data analysis with
data cubes. Prediction cubes compute prediction models in multidimensional
cube space. They help users identify interesting data subsets at varying degrees of
granularity for effective prediction. Multifeature cubes compute complex queries
involving multiple dependent aggregates at multiple granularities. Exception-based,
discovery-driven exploration of cube space displays visual cues to indicate discov-
ered data exceptions at all aggregation levels, thereby guiding the user in the data
analysis process.
5.6 Exercises
5.1 Assume that a 10-D base cuboid contains only three base cells: (1) (a 1 , d 2 , d 3 , d 4 , ...,
d 9 , d 10 ), (2) (d 1 ,b 2 , d 3 , d 4 ,..., d 9 , d 10 ), and (3) (d 1 , d 2 , c 3 , d 4 ,..., d 9 , d 10 ), where a 1 6=
d 1 , b 2 6= d 2 , and c 3 6= d 3 . The measure of the cube is count( ).
(a) How many nonempty cuboids will a full data cube contain?
(b) How many nonempty aggregate (i.e., nonbase) cells will a full cube contain?