Page 195 -
P. 195

11-ch04-125-186-9780123814791
                         HAN
                                                            2011/6/1
          158   Chapter 4 Data Warehousing and Online Analytical Processing  3:17 Page 158  #34



                           The base cuboid contains all three dimensions, city, item, and year. It can return
                         the total sales for any combination of the three dimensions. The apex cuboid, or 0-D
                         cuboid, refers to the case where the group-by is empty. It contains the total sum of all
                         sales. The base cuboid is the least generalized (most specific) of the cuboids. The apex
                         cuboid is the most generalized (least specific) of the cuboids, and is often denoted as all.
                         If we start at the apex cuboid and explore downward in the lattice, this is equivalent to
                         drilling down within the data cube. If we start at the base cuboid and explore upward,
                         this is akin to rolling up.


                           An SQL query containing no group-by (e.g., “compute the sum of total sales”) is a zero-
                         dimensional operation. An SQL query containing one group-by (e.g., “compute the sum
                         of sales, group-by city”) is a one-dimensional operation. A cube operator on n dimensions
                         is equivalent to a collection of group-by statements, one for each subset of the n dimen-
                         sions. Therefore, the cube operator is the n-dimensional generalization of the group-by
                         operator.
                           Similar to the SQL syntax, the data cube in Example 4.1 could be defined as

                               define cube sales cube [city, item, year]: sum(sales in dollars)

                                                                 n
                         For a cube with n dimensions, there are a total of 2 cuboids, including the base cuboid.
                         A statement such as

                               compute cube sales cube

                         would explicitly instruct the system to compute the sales aggregate cuboids for all eight
                         subsets of the set {city, item, year}, including the empty subset. A cube computation
                                                                        +
                         operator was first proposed and studied by Gray et al. [GCB 97].
                           Online analytical processing may need to access different cuboids for different
                         queries. Therefore, it may seem like a good idea to compute in advance all or at least
                         some of the cuboids in a data cube. Precomputation leads to fast response time and
                         avoids some redundant computation. Most, if not all, OLAP products resort to some
                         degree of precomputation of multidimensional aggregates.
                           A major challenge related to this precomputation, however, is that the required stor-
                         age space may explode if all the cuboids in a data cube are precomputed, especially when
                         the cube has many dimensions. The storage requirements are even more excessive when
                         many of the dimensions have associated concept hierarchies, each with multiple levels.
                         This problem is referred to as the curse of dimensionality. The extent of the curse of
                         dimensionality is illustrated here.
                           “How many cuboids are there in an n-dimensional data cube?” If there were no
                         hierarchies associated with each dimension, then the total number of cuboids for an
                                                             n
                         n-dimensional data cube, as we have seen, is 2 . However, in practice, many dimensions
                         do have hierarchies. For example, time is usually explored not at only one conceptual
                         level (e.g., year), but rather at multiple conceptual levels such as in the hierarchy “day <
                         month < quarter < year.” For an n-dimensional data cube, the total number of cuboids
   190   191   192   193   194   195   196   197   198   199   200