Page 264 -

P. 264

2011/6/1
3:19 Page 227
HAN
12-ch05-187-242-9780123814791
#41
5.4 Multidimensional Data Analysis in Cube Space 227

Most real-life top-k queries are likely to involve only a small subset of selection
attributes. To support high-dimensional ranking cubes, we can carefully select the
cuboids that need to be materialized. For example, we could choose to materialize only
the 1-D cuboids that contain single-selection dimensions. This will achieve low space
overhead and still have high performance when the number of selection dimensions
is large. In some cases, there may exist many ranking dimensions to support multiple
users with rather different preferences. For example, buyers may search for houses by
considering various factors like price, distance to school or shopping, number of years
old, ﬂoor space, and tax. In this case, a possible solution is to create multiple data parti-
tions, each of which consists of a subset of the ranking dimensions. The query processing
may need to search over a joint space involving multiple data partitions.
In summary, the general philosophy of ranking cubes is to materialize such cubes
on the set of selection dimensions. Use of the interval-based partitioning in ranking
dimensions makes the ranking cube efﬁcient and ﬂexible at supporting ad hoc user
queries. Various implementation techniques and query optimization methods have been
developed for efﬁcient computation and query processing based on this framework.

5.4 Multidimensional Data Analysis in Cube Space

Data cubes create a ﬂexible and powerful means to group and aggregate data subsets.
They allow data to be explored in multiple dimensional combinations and at vary-
ing aggregate granularities. This capability greatly increases the analysis bandwidth and
helps effective discovery of interesting patterns and knowledge from data. The use of
cube space makes the data space both meaningful and tractable.
This section presents methods of multidimensional data analysis that make use of
data cubes to organize data into intuitive regions of interest at varying granularities.
Section 5.4.1 presents prediction cubes, a technique for multidimensional data mining
that facilitates predictive modeling in multidimensional space. Section 5.4.2 describes
how to construct multifeature cubes. These support complex analytical queries involving
multiple dependent aggregates at multiple granularities. Finally, Section 5.4.3 describes
an interactive method for users to systematically explore cube space. In such exception-
based, discovery-driven exploration, interesting exceptions or anomalies in the data are
automatically detected and marked for users with visual cues.

5.4.1 Prediction Cubes: Prediction Mining in Cube Space
Recently, researchers have turned their attention toward multidimensional data min-
ing to uncover knowledge at varying dimensional combinations and granularities. Such
mining is also known as exploratory multidimensional data mining and online analytical
data mining (OLAM). Multidimensional data space is huge. In preparing the data, how
can we identify the interesting subspaces for exploration? To what granularities should
we aggregate the data? Multidimensional data mining in cube space organizes data of

259 260 261 262 263 264 265 266 267 268 269