Page 195 -
P. 195
11-ch04-125-186-9780123814791
HAN
2011/6/1
158 Chapter 4 Data Warehousing and Online Analytical Processing 3:17 Page 158 #34
The base cuboid contains all three dimensions, city, item, and year. It can return
the total sales for any combination of the three dimensions. The apex cuboid, or 0-D
cuboid, refers to the case where the group-by is empty. It contains the total sum of all
sales. The base cuboid is the least generalized (most specific) of the cuboids. The apex
cuboid is the most generalized (least specific) of the cuboids, and is often denoted as all.
If we start at the apex cuboid and explore downward in the lattice, this is equivalent to
drilling down within the data cube. If we start at the base cuboid and explore upward,
this is akin to rolling up.
An SQL query containing no group-by (e.g., “compute the sum of total sales”) is a zero-
dimensional operation. An SQL query containing one group-by (e.g., “compute the sum
of sales, group-by city”) is a one-dimensional operation. A cube operator on n dimensions
is equivalent to a collection of group-by statements, one for each subset of the n dimen-
sions. Therefore, the cube operator is the n-dimensional generalization of the group-by
operator.
Similar to the SQL syntax, the data cube in Example 4.1 could be defined as
define cube sales cube [city, item, year]: sum(sales in dollars)
n
For a cube with n dimensions, there are a total of 2 cuboids, including the base cuboid.
A statement such as
compute cube sales cube
would explicitly instruct the system to compute the sales aggregate cuboids for all eight
subsets of the set {city, item, year}, including the empty subset. A cube computation
+
operator was first proposed and studied by Gray et al. [GCB 97].
Online analytical processing may need to access different cuboids for different
queries. Therefore, it may seem like a good idea to compute in advance all or at least
some of the cuboids in a data cube. Precomputation leads to fast response time and
avoids some redundant computation. Most, if not all, OLAP products resort to some
degree of precomputation of multidimensional aggregates.
A major challenge related to this precomputation, however, is that the required stor-
age space may explode if all the cuboids in a data cube are precomputed, especially when
the cube has many dimensions. The storage requirements are even more excessive when
many of the dimensions have associated concept hierarchies, each with multiple levels.
This problem is referred to as the curse of dimensionality. The extent of the curse of
dimensionality is illustrated here.
“How many cuboids are there in an n-dimensional data cube?” If there were no
hierarchies associated with each dimension, then the total number of cuboids for an
n
n-dimensional data cube, as we have seen, is 2 . However, in practice, many dimensions
do have hierarchies. For example, time is usually explored not at only one conceptual
level (e.g., year), but rather at multiple conceptual levels such as in the hierarchy “day <
month < quarter < year.” For an n-dimensional data cube, the total number of cuboids