Page 265 -
P. 265

12-ch05-187-242-9780123814791
                         HAN

          228   Chapter 5 Data Cube Technology              2011/6/1  3:19 Page 228  #42



                         interest into intuitive regions at various granularities. It analyzes and mines the data by
                         applying various data mining techniques systematically over these regions.
                           There are at least four ways in which OLAP-style analysis can be fused with data
                         mining techniques:

                         1. Use cube space to define the data space for mining. Each region in cube space repre-
                           sents a subset of data over which we wish to find interesting patterns. Cube space
                           is defined by a set of expert-designed, informative dimension hierarchies, not just
                           arbitrary subsets of data. Therefore, the use of cube space makes the data space both
                           meaningful and tractable.
                         2. Use OLAP queries to generate features and targets for mining. The features and even
                           the targets (that we wish to learn to predict) can sometimes be naturally defined as
                           OLAP aggregate queries over regions in cube space.
                         3. Use data mining models as building blocks in a multistep mining process. Multidimen-
                           sional data mining in cube space may consist of multiple steps, where data mining
                           models can be viewed as building blocks that are used to describe the behavior of
                           interesting data sets, rather than the end results.
                         4. Use data cube computation techniques to speed up repeated model construction. Multi-
                           dimensional data mining in cube space may require building a model for each
                           candidate data space, which is usually too expensive to be feasible. However, by care-
                           fully sharing computation across model construction for different candidates based
                           on data cube computation techniques, efficient mining is achievable.

                           In this subsection we study prediction cubes, an example of multidimensional data
                         mining where the cube space is explored for prediction tasks. A prediction cube is a cube
                         structure that stores prediction models in multidimensional data space and supports
                         prediction in an OLAP manner. Recall that in a data cube, each cell value is an aggregate
                         number (e.g., count) computed over the data subset in that cell. However, each cell value
                         in a prediction cube is computed by evaluating a predictive model built on the data
                         subset in that cell, thereby representing that subset’s predictive behavior.
                           Instead of seeing prediction models as the end result, prediction cubes use prediction
                         models as building blocks to define the interestingness of data subsets, that is, they iden-
                         tify data subsets that indicate more accurate prediction. This is best explained with an
                         example.

           Example 5.18 Prediction cube for identification of interesting cube subspaces. Suppose a company
                         has a customer table with the attributes time (with two granularity levels: month and
                         year), location (with two granularity levels: state and country), gender, salary, and one
                         class-label attribute: valued customer. A manager wants to analyze the decision process
                         of whether a customer is highly valued with respect to time and location. In particular,
                         he is interested in the question “Are there times at and locations in which the value of a
   260   261   262   263   264   265   266   267   268   269   270