Page 268 -
P. 268
2011/6/1
#45
3:19 Page 231
12-ch05-187-242-9780123814791
HAN
5.4 Multidimensional Data Analysis in Cube Space 231
region, and month. It is an n-dimensional generalization of the group-by clause. The
attributes specified in the cube by clause are the grouping attributes. Tuples with the
same value on all grouping attributes form one group. Let the groups be g 1 ,..., g r . For
among the tuples forming the group
each group of tuples g i , the maximum price max g i
is computed. The variable R is a grouping variable, ranging over all tuples in group g i
(as specified in the such that clause). The sum of sales
that have a price equal to max g i
of the tuples in g i that R ranges over is computed and returned with the values of the
grouping attributes of g i .
The resulting cube is a multifeature cube in that it supports complex data mining
queries for which multiple dependent aggregates are computed at a variety of gran-
ularities. For example, the sum of sales returned in this query is dependent on the
set of maximum price tuples for each group. In general, multifeature cubes give users
the flexibility to define sophisticated, task-specific cubes on which multidimensional
aggregation and OLAP-based mining can be performed.
“How can multifeature cubes be computed efficiently?” The computation of a multifea-
ture cube depends on the types of aggregate functions used in the cube. In Chapter 4,
we saw that aggregate functions can be categorized as either distributive, algebraic, or
holistic. Multifeature cubes can be organized into the same categories and computed
efficiently by minor extension of the cube computation methods in Section 5.2.
5.4.3 Exception-Based, Discovery-Driven Cube Space Exploration
As studied in previous sections, a data cube may have a large number of cuboids, and
each cuboid may contain a large number of (aggregate) cells. With such an overwhelm-
ingly large space, it becomes a burden for users to even just browse a cube, let alone think
of exploring it thoroughly. Tools need to be developed to assist users in intelligently
exploring the huge aggregated space of a data cube.
In this section, we describe a discovery-driven approach to exploring cube space.
Precomputed measures indicating data exceptions are used to guide the user in the data
analysis process, at all aggregation levels. We hereafter refer to these measures as excep-
tion indicators. Intuitively, an exception is a data cube cell value that is significantly
different from the value anticipated, based on a statistical model. The model considers
variations and patterns in the measure value across all the dimensions to which a cell
belongs. For example, if the analysis of item-sales data reveals an increase in sales in
December in comparison to all other months, this may seem like an exception in the
time dimension. However, it is not an exception if the item dimension is considered,
since there is a similar increase in sales for other items during December.
The model considers exceptions hidden at all aggregated group-by’s of a data cube.
Visual cues, such as background color, are used to reflect each cell’s degree of exception,
based on the precomputed exception indicators. Efficient algorithms have been pro-
posed for cube construction, as discussed in Section 5.2. The computation of exception
indicators can be overlapped with cube construction, so that the overall construction of
data cubes for discovery-driven exploration is efficient.