Page 266 -

P. 266

2011/6/1
3:19 Page 229
#43
12-ch05-187-242-9780123814791
HAN
5.4 Multidimensional Data Analysis in Cube Space 229

customer depended greatly on the customer’s gender?” Notice that he believes time and
location play a role in predicting valued customers, but at what granularity levels do
they depend on gender for this task? For example, is performing analysis using {month,
country} better than {year, state}?
Consider a data table D (e.g., the customer table). Let X be the attributes set for
which no concept hierarchy has been deﬁned (e.g., gender, salary). Let Y be the class-
label attribute (e.g., valued customer), and Z be the set of multilevel attributes, that is,
attributes for which concept hierarchies have been deﬁned (e.g., time, location). Let V
be the set of attributes for which we would like to deﬁne their predictiveness. In our
example, this set is {gender}. The predictiveness of V on a data subset can be quantiﬁed
by the difference in accuracy between the model built on that subset using X to predict Y
and the model built on that subset using X − V (e.g., {salary}) to predict Y. The intuition
is that, if the difference is large, V must play an important role in the prediction of class
label Y.
Given a set of attributes, V, and a learning algorithm, the prediction cube at granular-
ity hl 1 ,...,l d i (e.g., hyear,statei) is a d-dimensional array, in which the value in each cell
(e.g., [2010, Illinois]) is the predictiveness of V evaluated on the subset deﬁned by the
cell (e.g., the records in the customer table with time in 2010 and location in Illinois).
Supporting OLAP roll-up and drill-down operations on a prediction cube is a
computational challenge requiring the materialization of cell values at many different
granularities. For simplicity, we can consider only full materialization. A naïve way to
fully materialize a prediction cube is to exhaustively build models and evaluate them for
each cell and granularity. This method is very expensive if the base data set is large.
An ensemble method called Probability-Based Ensemble (PBE) was developed as a
more feasible alternative. It requires model construction for only the ﬁnest-grained
cells. OLAP-style bottom-up aggregation is then used to generate the values of the
coarser-grained cells.
The prediction of a predictive model can be seen as ﬁnding a class label that maxi-
mizes a scoring function. The PBE method was developed to approximately make the
scoring function of any predictive model distributively decomposable. In our discus-
sion of data cube measures in Section 4.2.4, we showed that distributive and algebraic
measures can be computed efﬁciently. Therefore, if the scoring function used is dis-
tributively or algebraically decomposable, prediction cubes can also be computed with
efﬁciency. In this way, the PBE method reduces prediction cube computation to data
cube computation.
For example, previous studies have shown that the na¨ ıve Bayes classiﬁer has an alge-
braically decomposable scoring function, and the kernel density–based classiﬁer has a
8
distributively decomposable scoring function. Therefore, either of these could be used

8 Na¨ ıve Bayes classiﬁers are detailed in Chapter 8. Kernel density–based classiﬁers, such as support vector
machines, are described in Chapter 9.

261 262 263 264 265 266 267 268 269 270 271