Page 248 -

P. 248

2011/6/1
#25
12-ch05-187-242-9780123814791
3:19 Page 211
HAN
5.2 Data Cube Computation Methods 211

threshold), it will generate 2 60 iceberg cube cells. Second, it is difﬁcult to determine an
appropriate iceberg threshold. Setting the threshold too low will result in a huge cube,
whereas setting the threshold too high may invalidate many useful applications. Third,
an iceberg cube cannot be incrementally updated. Once an aggregate cell falls below
the iceberg threshold and is pruned, its measure value is lost. Any incremental update
would require recomputing the cells from scratch. This is extremely undesirable for large
real-life applications where incremental appending of new data is the norm.
One possible solution, which has been implemented in some commercial data ware-
house systems, is to compute a thin cube shell. For example, we could compute all
cuboids with three dimensions or less in a 60-dimensional data cube, resulting in a cube
shell of size 3. The resulting cuboids set would require much less computation and stor-
age than the full 60-dimensional data cube. However, there are two disadvantages to
this approach. First, we would still need to compute 60 + 60 + 60 = 36,050 cuboids,
3 2
each with many cells. Second, such a cube shell does not support high-dimensional
OLAP because (1) it does not support OLAP on four or more dimensions, and (2) it
cannot even support drilling along three dimensions, such as, say, (A 4 , A 5 , A 6 ), on a sub-
set of data selected based on the constants provided in three other dimensions, such as
(A 1 , A 2 , A 3 ), because this essentially requires the computation of the corresponding 6-D
cuboid. (Notice that there is no cell in cuboid (A 4 , A 5 , A 6 ) computed for any particular
constant set, such as (a 1 , a 2 , a 3 ), associated with dimensions (A 1 , A 2 , A 3 ).)
Instead of computing a cube shell, we can compute only portions or fragments of it.
This section discusses the shell fragment approach for OLAP query processing. It is based
on the following key observation about OLAP in high-dimensional space. Although a
data cube may contain many dimensions, most OLAP operations are performed on only a
small number of dimensions at a time. In other words, an OLAP query is likely to ignore
many dimensions (i.e., treating them as irrelevant), ﬁx some dimensions (e.g., using
query constants as instantiations), and leave only a few to be manipulated (for drilling,
pivoting, etc.). This is because it is neither realistic nor fruitful for anyone to compre-
hend the changes of thousands of cells involving tens of dimensions simultaneously in a
high-dimensional space at the same time.
Instead, it is more natural to ﬁrst locate some cuboids of interest and then drill
along one or two dimensions to examine the changes of a few related dimensions.
Most analysts will only need to examine, at any one moment, the combinations of a
small number of dimensions. This implies that if multidimensional aggregates can be
computed quickly on a small number of dimensions inside a high-dimensional space, we
may still achieve fast OLAP without materializing the original high-dimensional data
cube. Computing the full cube (or, often, even an iceberg cube or cube shell) can be
excessive. Instead, a semi-online computation model with certain preprocessing may offer
a more feasible solution. Given a base cuboid, some quick preparation computation can
be done ﬁrst (i.e., ofﬂine). After that, a query can then be computed online using the
preprocessed data.
The shell fragment approach follows such a semi-online computation strategy. It
involves two algorithms: one for computing cube shell fragments and the other for query
processing with the cube fragments. The shell fragment approach can handle databases

243 244 245 246 247 248 249 250 251 252 253