Page 248 -
P. 248

2011/6/1
                                                                                    #25
                               12-ch05-187-242-9780123814791
                                                                     3:19 Page 211
                         HAN
                                                               5.2 Data Cube Computation Methods  211


                               threshold), it will generate 2 60  iceberg cube cells. Second, it is difficult to determine an
                               appropriate iceberg threshold. Setting the threshold too low will result in a huge cube,
                               whereas setting the threshold too high may invalidate many useful applications. Third,
                               an iceberg cube cannot be incrementally updated. Once an aggregate cell falls below
                               the iceberg threshold and is pruned, its measure value is lost. Any incremental update
                               would require recomputing the cells from scratch. This is extremely undesirable for large
                               real-life applications where incremental appending of new data is the norm.
                                 One possible solution, which has been implemented in some commercial data ware-
                               house systems, is to compute a thin cube shell. For example, we could compute all
                               cuboids with three dimensions or less in a 60-dimensional data cube, resulting in a cube
                               shell of size 3. The resulting cuboids set would require much less computation and stor-
                               age than the full 60-dimensional data cube. However, there are two disadvantages to
                               this approach. First, we would still need to compute  60    +  60   + 60 = 36,050 cuboids,
                                                                          3    2
                               each with many cells. Second, such a cube shell does not support high-dimensional
                               OLAP because (1) it does not support OLAP on four or more dimensions, and (2) it
                               cannot even support drilling along three dimensions, such as, say, (A 4 , A 5 , A 6 ), on a sub-
                               set of data selected based on the constants provided in three other dimensions, such as
                               (A 1 , A 2 , A 3 ), because this essentially requires the computation of the corresponding 6-D
                               cuboid. (Notice that there is no cell in cuboid (A 4 , A 5 , A 6 ) computed for any particular
                               constant set, such as (a 1 , a 2 , a 3 ), associated with dimensions (A 1 , A 2 , A 3 ).)
                                 Instead of computing a cube shell, we can compute only portions or fragments of it.
                               This section discusses the shell fragment approach for OLAP query processing. It is based
                               on the following key observation about OLAP in high-dimensional space. Although a
                               data cube may contain many dimensions, most OLAP operations are performed on only a
                               small number of dimensions at a time. In other words, an OLAP query is likely to ignore
                               many dimensions (i.e., treating them as irrelevant), fix some dimensions (e.g., using
                               query constants as instantiations), and leave only a few to be manipulated (for drilling,
                               pivoting, etc.). This is because it is neither realistic nor fruitful for anyone to compre-
                               hend the changes of thousands of cells involving tens of dimensions simultaneously in a
                               high-dimensional space at the same time.
                                 Instead, it is more natural to first locate some cuboids of interest and then drill
                               along one or two dimensions to examine the changes of a few related dimensions.
                               Most analysts will only need to examine, at any one moment, the combinations of a
                               small number of dimensions. This implies that if multidimensional aggregates can be
                               computed quickly on a small number of dimensions inside a high-dimensional space, we
                               may still achieve fast OLAP without materializing the original high-dimensional data
                               cube. Computing the full cube (or, often, even an iceberg cube or cube shell) can be
                               excessive. Instead, a semi-online computation model with certain preprocessing may offer
                               a more feasible solution. Given a base cuboid, some quick preparation computation can
                               be done first (i.e., offline). After that, a query can then be computed online using the
                               preprocessed data.
                                 The shell fragment approach follows such a semi-online computation strategy. It
                               involves two algorithms: one for computing cube shell fragments and the other for query
                               processing with the cube fragments. The shell fragment approach can handle databases
   243   244   245   246   247   248   249   250   251   252   253