Page 267 -
P. 267
12-ch05-187-242-9780123814791
HAN
230 Chapter 5 Data Cube Technology 2011/6/1 3:19 Page 230 #44
to implement prediction cubes efficiently. The PBE method presents a novel approach
to multidimensional data mining in cube space.
5.4.2 Multifeature Cubes: Complex Aggregation
at Multiple Granularities
Data cubes facilitate the answering of analytical or mining-oriented queries as they allow
the computation of aggregate data at multiple granularity levels. Traditional data cubes
are typically constructed on commonly used dimensions (e.g., time, location, and prod-
uct) using simple measures (e.g., count( ), average( ), and sum()). In this section, you will
learn a newer way to define data cubes called multifeature cubes. Multifeature cubes
enable more in-depth analysis. They can compute more complex queries of which the
measures depend on groupings of multiple aggregates at varying granularity levels. The
queries posed can be much more elaborate and task-specific than traditional queries,
as we shall illustrate in the next examples. Many complex data mining queries can be
answered by multifeature cubes without significant increase in computational cost, in
comparison to cube computation for simple queries with traditional data cubes.
To illustrate the idea of multifeature cubes, let’s first look at an example of a query on
a simple data cube.
Example 5.19 A simple data cube query. Let the query be “Find the total sales in 2010, broken down
by item, region, and month, with subtotals for each dimension.” To answer this query, a
traditional data cube is constructed that aggregates the total sales at the following eight
different granularity levels: {(item, region, month), (item, region), (item, month), (month,
region), (item), (month), (region), ()}, where () represents all. This data cube is simple in
that it does not involve any dependent aggregates.
To illustrate what is meant by “dependent aggregates,” let’s examine a more complex
query, which can be computed with a multifeature cube.
Example 5.20 A complex query involving dependent aggregates. Suppose the query is “Grouping by
all subsets of {item, region, month}, find the maximum price in 2010 for each group and the
total sales among all maximum price tuples.”
The specification of such a query using standard SQL can be long, repetitive, and
difficult to optimize and maintain. Alternatively, it can be specified concisely using an
extended SQL syntax as follows:
select item, region, month, max(price), sum(R.sales)
from Purchases
where year = 2010
cube by item, region, month: R
such that R.price = max(price)
The tuples representing purchases in 2010 are first selected. The cube by clause com-
putes aggregates (or group-by’s) for all possible combinations of the attributes item,