Page 196 -
P. 196

#35
                                                                     3:17 Page 159
                               11-ch04-125-186-9780123814791
                         HAN
                                                            2011/6/1
                                                                4.4 Data Warehouse Implementation  159


                               that can be generated (including the cuboids generated by climbing up the hierarchies
                               along each dimension) is
                                                                        n
                                                                       Y
                                                  Total number of cuboids =  (L i + 1),         (4.1)
                                                                       i=1
                               where L i is the number of levels associated with dimension i. One is added to L i in
                               Eq. (4.1) to include the virtual top level, all. (Note that generalizing to all is equivalent to
                               the removal of the dimension.)
                                 This formula is based on the fact that, at most, one abstraction level in each dimen-
                               sion will appear in a cuboid. For example, the time dimension as specified before has
                               four conceptual levels, or five if we include the virtual level all. If the cube has 10 dimen-
                               sions and each dimension has five levels (including all), the total number of cuboids
                                                              6
                               that can be generated is 5 10  ≈ 9.8 × 10 . The size of each cuboid also depends on the
                               cardinality (i.e., number of distinct values) of each dimension. For example, if the All-
                               Electronics branch in each city sold every item, there would be |city| × |item| tuples in the
                               city − item group-by alone. As the number of dimensions, number of conceptual hierar-
                               chies, or cardinality increases, the storage space required for many of the group-by’s will
                               grossly exceed the (fixed) size of the input relation.
                                 By now, you probably realize that it is unrealistic to precompute and materialize all
                               of the cuboids that can possibly be generated for a data cube (i.e., from a base cuboid).
                               If there are many cuboids, and these cuboids are large in size, a more reasonable option
                               is partial materialization; that is, to materialize only some of the possible cuboids that
                               can be generated.

                               Partial Materialization: Selected Computation
                               of Cuboids
                               There are three choices for data cube materialization given a base cuboid:

                               1. No materialization: Do not precompute any of the “nonbase” cuboids. This leads
                                 to computing expensive multidimensional aggregates on-the-fly, which can be extre-
                                 mely slow.
                               2. Full materialization: Precompute all of the cuboids. The resulting lattice of com-
                                 puted cuboids is referred to as the full cube. This choice typically requires huge
                                 amounts of memory space in order to store all of the precomputed cuboids.
                               3. Partial materialization: Selectively compute a proper subset of the whole set of pos-
                                 sible cuboids. Alternatively, we may compute a subset of the cube, which contains
                                 only those cells that satisfy some user-specified criterion, such as where the tuple
                                 count of each cell is above some threshold. We will use the term subcube to refer to
                                 the latter case, where only some of the cells may be precomputed for various cuboids.
                                 Partial materialization represents an interesting trade-off between storage space and
                                 response time.
   191   192   193   194   195   196   197   198   199   200   201