Page 267 -

P. 267

12-ch05-187-242-9780123814791
HAN

230 Chapter 5 Data Cube Technology 2011/6/1 3:19 Page 230 #44

to implement prediction cubes efﬁciently. The PBE method presents a novel approach
to multidimensional data mining in cube space.

5.4.2 Multifeature Cubes: Complex Aggregation
at Multiple Granularities
Data cubes facilitate the answering of analytical or mining-oriented queries as they allow
the computation of aggregate data at multiple granularity levels. Traditional data cubes
are typically constructed on commonly used dimensions (e.g., time, location, and prod-
uct) using simple measures (e.g., count( ), average( ), and sum()). In this section, you will
learn a newer way to deﬁne data cubes called multifeature cubes. Multifeature cubes
enable more in-depth analysis. They can compute more complex queries of which the
measures depend on groupings of multiple aggregates at varying granularity levels. The
queries posed can be much more elaborate and task-speciﬁc than traditional queries,
as we shall illustrate in the next examples. Many complex data mining queries can be
answered by multifeature cubes without signiﬁcant increase in computational cost, in
comparison to cube computation for simple queries with traditional data cubes.
To illustrate the idea of multifeature cubes, let’s ﬁrst look at an example of a query on
a simple data cube.

Example 5.19 A simple data cube query. Let the query be “Find the total sales in 2010, broken down
by item, region, and month, with subtotals for each dimension.” To answer this query, a
traditional data cube is constructed that aggregates the total sales at the following eight
different granularity levels: {(item, region, month), (item, region), (item, month), (month,
region), (item), (month), (region), ()}, where () represents all. This data cube is simple in
that it does not involve any dependent aggregates.

To illustrate what is meant by “dependent aggregates,” let’s examine a more complex
query, which can be computed with a multifeature cube.

Example 5.20 A complex query involving dependent aggregates. Suppose the query is “Grouping by
all subsets of {item, region, month}, ﬁnd the maximum price in 2010 for each group and the
total sales among all maximum price tuples.”
The speciﬁcation of such a query using standard SQL can be long, repetitive, and
difﬁcult to optimize and maintain. Alternatively, it can be speciﬁed concisely using an
extended SQL syntax as follows:

select item, region, month, max(price), sum(R.sales)
from Purchases
where year = 2010
cube by item, region, month: R
such that R.price = max(price)
The tuples representing purchases in 2010 are ﬁrst selected. The cube by clause com-
putes aggregates (or group-by’s) for all possible combinations of the attributes item,

262 263 264 265 266 267 268 269 270 271 272