Page 265 -

P. 265

12-ch05-187-242-9780123814791
HAN

228 Chapter 5 Data Cube Technology 2011/6/1 3:19 Page 228 #42

interest into intuitive regions at various granularities. It analyzes and mines the data by
applying various data mining techniques systematically over these regions.
There are at least four ways in which OLAP-style analysis can be fused with data
mining techniques:

1. Use cube space to deﬁne the data space for mining. Each region in cube space repre-
sents a subset of data over which we wish to ﬁnd interesting patterns. Cube space
is deﬁned by a set of expert-designed, informative dimension hierarchies, not just
arbitrary subsets of data. Therefore, the use of cube space makes the data space both
meaningful and tractable.
2. Use OLAP queries to generate features and targets for mining. The features and even
the targets (that we wish to learn to predict) can sometimes be naturally deﬁned as
OLAP aggregate queries over regions in cube space.
3. Use data mining models as building blocks in a multistep mining process. Multidimen-
sional data mining in cube space may consist of multiple steps, where data mining
models can be viewed as building blocks that are used to describe the behavior of
interesting data sets, rather than the end results.
4. Use data cube computation techniques to speed up repeated model construction. Multi-
dimensional data mining in cube space may require building a model for each
candidate data space, which is usually too expensive to be feasible. However, by care-
fully sharing computation across model construction for different candidates based
on data cube computation techniques, efﬁcient mining is achievable.

In this subsection we study prediction cubes, an example of multidimensional data
mining where the cube space is explored for prediction tasks. A prediction cube is a cube
structure that stores prediction models in multidimensional data space and supports
prediction in an OLAP manner. Recall that in a data cube, each cell value is an aggregate
number (e.g., count) computed over the data subset in that cell. However, each cell value
in a prediction cube is computed by evaluating a predictive model built on the data
subset in that cell, thereby representing that subset’s predictive behavior.
Instead of seeing prediction models as the end result, prediction cubes use prediction
models as building blocks to deﬁne the interestingness of data subsets, that is, they iden-
tify data subsets that indicate more accurate prediction. This is best explained with an
example.

Example 5.18 Prediction cube for identiﬁcation of interesting cube subspaces. Suppose a company
has a customer table with the attributes time (with two granularity levels: month and
year), location (with two granularity levels: state and country), gender, salary, and one
class-label attribute: valued customer. A manager wants to analyze the decision process
of whether a customer is highly valued with respect to time and location. In particular,
he is interested in the question “Are there times at and locations in which the value of a

260 261 262 263 264 265 266 267 268 269 270