Page 225 -
P. 225
HAN 12-ch05-187-242-9780123814791
188 Chapter 5 Data Cube Technology 2011/6/1 3:19 Page 188 #2
and thereby ready for use) and partial cuboid materialization (where, say, only the more
“useful” parts of the data cube are precomputed). The multiway array aggregation
method is detailed for full cube computation. Methods for partial cube computation,
including BUC, Star-Cubing, and the use of cube shell fragments, are discussed.
In Section 5.3, we study cube-based query processing. The techniques described build
on the standard methods of cube computation presented in Section 5.2. You will learn
about sampling cubes for OLAP query answering on sampling data (e.g., survey data,
which represent a sample or subset of a target data population of interest). In addi-
tion, you will learn how to compute ranking cubes for efficient top-k (ranking) query
processing in large relational data sets.
In Section 5.4, we describe various ways to perform multidimensional data analysis
using data cubes. Prediction cubes are introduced, which facilitate predictive modeling in
multidimensional space. We discuss multifeature cubes, which compute complex queries
involving multiple dependent aggregates at multiple granularities. You will also learn
about the exception-based discovery-driven exploration of cube space, where visual cues
are displayed to indicate discovered data exceptions at all aggregation levels, thereby
guiding the user in the data analysis process.
5.1 Data Cube Computation: Preliminary Concepts
Data cubes facilitate the online analytical processing of multidimensional data. “But how
can we compute data cubes in advance, so that they are handy and readily available for
query processing?” This section contrasts full cube materialization (i.e., precomputation)
versus various strategies for partial cube materialization. For completeness, we begin
with a review of the basic terminology involving data cubes. We also introduce a cube
cell notation that is useful for describing data cube computation methods.
5.1.1 Cube Materialization: Full Cube, Iceberg Cube,
Closed Cube, and Cube Shell
Figure 5.1 shows a 3-D data cube for the dimensions A, B, and C, and an aggregate mea-
sure, M. Commonly used measures include count(), sum(), min(), max(), and total sales().
A data cube is a lattice of cuboids. Each cuboid represents a group-by. ABC is the base
cuboid, containing all three of the dimensions. Here, the aggregate measure, M, is com-
puted for each possible combination of the three dimensions. The base cuboid is the
least generalized of all the cuboids in the data cube. The most generalized cuboid is the
apex cuboid, commonly represented as all. It contains one value—it aggregates measure
M for all the tuples stored in the base cuboid. To drill down in the data cube, we move
from the apex cuboid downward in the lattice. To roll up, we move from the base cuboid
upward. For the purposes of our discussion in this chapter, we will always use the term
data cube to refer to a lattice of cuboids rather than an individual cuboid.