Page 264 -
P. 264

2011/6/1
                                                                     3:19 Page 227
                         HAN
                               12-ch05-187-242-9780123814791
                                                                                    #41
                                                     5.4 Multidimensional Data Analysis in Cube Space  227


                                 Most real-life top-k queries are likely to involve only a small subset of selection
                               attributes. To support high-dimensional ranking cubes, we can carefully select the
                               cuboids that need to be materialized. For example, we could choose to materialize only
                               the 1-D cuboids that contain single-selection dimensions. This will achieve low space
                               overhead and still have high performance when the number of selection dimensions
                               is large. In some cases, there may exist many ranking dimensions to support multiple
                               users with rather different preferences. For example, buyers may search for houses by
                               considering various factors like price, distance to school or shopping, number of years
                               old, floor space, and tax. In this case, a possible solution is to create multiple data parti-
                               tions, each of which consists of a subset of the ranking dimensions. The query processing
                               may need to search over a joint space involving multiple data partitions.
                                 In summary, the general philosophy of ranking cubes is to materialize such cubes
                               on the set of selection dimensions. Use of the interval-based partitioning in ranking
                               dimensions makes the ranking cube efficient and flexible at supporting ad hoc user
                               queries. Various implementation techniques and query optimization methods have been
                               developed for efficient computation and query processing based on this framework.



                       5.4     Multidimensional Data Analysis in Cube Space


                               Data cubes create a flexible and powerful means to group and aggregate data subsets.
                               They allow data to be explored in multiple dimensional combinations and at vary-
                               ing aggregate granularities. This capability greatly increases the analysis bandwidth and
                               helps effective discovery of interesting patterns and knowledge from data. The use of
                               cube space makes the data space both meaningful and tractable.
                                 This section presents methods of multidimensional data analysis that make use of
                               data cubes to organize data into intuitive regions of interest at varying granularities.
                               Section 5.4.1 presents prediction cubes, a technique for multidimensional data mining
                               that facilitates predictive modeling in multidimensional space. Section 5.4.2 describes
                               how to construct multifeature cubes. These support complex analytical queries involving
                               multiple dependent aggregates at multiple granularities. Finally, Section 5.4.3 describes
                               an interactive method for users to systematically explore cube space. In such exception-
                               based, discovery-driven exploration, interesting exceptions or anomalies in the data are
                               automatically detected and marked for users with visual cues.



                         5.4.1 Prediction Cubes: Prediction Mining in Cube Space
                               Recently, researchers have turned their attention toward multidimensional data min-
                               ing to uncover knowledge at varying dimensional combinations and granularities. Such
                               mining is also known as exploratory multidimensional data mining and online analytical
                               data mining (OLAM). Multidimensional data space is huge. In preparing the data, how
                               can we identify the interesting subspaces for exploration? To what granularities should
                               we aggregate the data? Multidimensional data mining in cube space organizes data of
   259   260   261   262   263   264   265   266   267   268   269