Page 225 -
P. 225

HAN 12-ch05-187-242-9780123814791


          188   Chapter 5 Data Cube Technology               2011/6/1  3:19  Page 188  #2



                         and thereby ready for use) and partial cuboid materialization (where, say, only the more
                         “useful” parts of the data cube are precomputed). The multiway array aggregation
                         method is detailed for full cube computation. Methods for partial cube computation,
                         including BUC, Star-Cubing, and the use of cube shell fragments, are discussed.
                           In Section 5.3, we study cube-based query processing. The techniques described build
                         on the standard methods of cube computation presented in Section 5.2. You will learn
                         about sampling cubes for OLAP query answering on sampling data (e.g., survey data,
                         which represent a sample or subset of a target data population of interest). In addi-
                         tion, you will learn how to compute ranking cubes for efficient top-k (ranking) query
                         processing in large relational data sets.
                           In Section 5.4, we describe various ways to perform multidimensional data analysis
                         using data cubes. Prediction cubes are introduced, which facilitate predictive modeling in
                         multidimensional space. We discuss multifeature cubes, which compute complex queries
                         involving multiple dependent aggregates at multiple granularities. You will also learn
                         about the exception-based discovery-driven exploration of cube space, where visual cues
                         are displayed to indicate discovered data exceptions at all aggregation levels, thereby
                         guiding the user in the data analysis process.


                 5.1     Data Cube Computation: Preliminary Concepts



                         Data cubes facilitate the online analytical processing of multidimensional data. “But how
                         can we compute data cubes in advance, so that they are handy and readily available for
                         query processing?” This section contrasts full cube materialization (i.e., precomputation)
                         versus various strategies for partial cube materialization. For completeness, we begin
                         with a review of the basic terminology involving data cubes. We also introduce a cube
                         cell notation that is useful for describing data cube computation methods.



                   5.1.1 Cube Materialization: Full Cube, Iceberg Cube,
                         Closed Cube, and Cube Shell

                         Figure 5.1 shows a 3-D data cube for the dimensions A, B, and C, and an aggregate mea-
                         sure, M. Commonly used measures include count(), sum(), min(), max(), and total sales().
                         A data cube is a lattice of cuboids. Each cuboid represents a group-by. ABC is the base
                         cuboid, containing all three of the dimensions. Here, the aggregate measure, M, is com-
                         puted for each possible combination of the three dimensions. The base cuboid is the
                         least generalized of all the cuboids in the data cube. The most generalized cuboid is the
                         apex cuboid, commonly represented as all. It contains one value—it aggregates measure
                         M for all the tuples stored in the base cuboid. To drill down in the data cube, we move
                         from the apex cuboid downward in the lattice. To roll up, we move from the base cuboid
                         upward. For the purposes of our discussion in this chapter, we will always use the term
                         data cube to refer to a lattice of cuboids rather than an individual cuboid.
   220   221   222   223   224   225   226   227   228   229   230