Page 266 -
P. 266

2011/6/1
                                                                     3:19 Page 229
                                                                                    #43
                               12-ch05-187-242-9780123814791
                         HAN
                                                     5.4 Multidimensional Data Analysis in Cube Space  229


                               customer depended greatly on the customer’s gender?” Notice that he believes time and
                               location play a role in predicting valued customers, but at what granularity levels do
                               they depend on gender for this task? For example, is performing analysis using {month,
                               country} better than {year, state}?
                                 Consider a data table D (e.g., the customer table). Let X be the attributes set for
                               which no concept hierarchy has been defined (e.g., gender, salary). Let Y be the class-
                               label attribute (e.g., valued customer), and Z be the set of multilevel attributes, that is,
                               attributes for which concept hierarchies have been defined (e.g., time, location). Let V
                               be the set of attributes for which we would like to define their predictiveness. In our
                               example, this set is {gender}. The predictiveness of V on a data subset can be quantified
                               by the difference in accuracy between the model built on that subset using X to predict Y
                               and the model built on that subset using X − V (e.g., {salary}) to predict Y. The intuition
                               is that, if the difference is large, V must play an important role in the prediction of class
                               label Y.
                                 Given a set of attributes, V, and a learning algorithm, the prediction cube at granular-
                               ity hl 1 ,...,l d i (e.g., hyear,statei) is a d-dimensional array, in which the value in each cell
                               (e.g., [2010, Illinois]) is the predictiveness of V evaluated on the subset defined by the
                               cell (e.g., the records in the customer table with time in 2010 and location in Illinois).
                                 Supporting OLAP roll-up and drill-down operations on a prediction cube is a
                               computational challenge requiring the materialization of cell values at many different
                               granularities. For simplicity, we can consider only full materialization. A naïve way to
                               fully materialize a prediction cube is to exhaustively build models and evaluate them for
                               each cell and granularity. This method is very expensive if the base data set is large.
                               An ensemble method called Probability-Based Ensemble (PBE) was developed as a
                               more feasible alternative. It requires model construction for only the finest-grained
                               cells. OLAP-style bottom-up aggregation is then used to generate the values of the
                               coarser-grained cells.
                                 The prediction of a predictive model can be seen as finding a class label that maxi-
                               mizes a scoring function. The PBE method was developed to approximately make the
                               scoring function of any predictive model distributively decomposable. In our discus-
                               sion of data cube measures in Section 4.2.4, we showed that distributive and algebraic
                               measures can be computed efficiently. Therefore, if the scoring function used is dis-
                               tributively or algebraically decomposable, prediction cubes can also be computed with
                               efficiency. In this way, the PBE method reduces prediction cube computation to data
                               cube computation.
                                 For example, previous studies have shown that the na¨ ıve Bayes classifier has an alge-
                               braically decomposable scoring function, and the kernel density–based classifier has a
                                                                   8
                               distributively decomposable scoring function. Therefore, either of these could be used

                               8 Na¨ ıve Bayes classifiers are detailed in Chapter 8. Kernel density–based classifiers, such as support vector
                               machines, are described in Chapter 9.
   261   262   263   264   265   266   267   268   269   270   271