Page 267 -
P. 267

12-ch05-187-242-9780123814791
                         HAN

          230   Chapter 5 Data Cube Technology              2011/6/1  3:19 Page 230  #44



                         to implement prediction cubes efficiently. The PBE method presents a novel approach
                         to multidimensional data mining in cube space.


                   5.4.2 Multifeature Cubes: Complex Aggregation
                         at Multiple Granularities
                         Data cubes facilitate the answering of analytical or mining-oriented queries as they allow
                         the computation of aggregate data at multiple granularity levels. Traditional data cubes
                         are typically constructed on commonly used dimensions (e.g., time, location, and prod-
                         uct) using simple measures (e.g., count( ), average( ), and sum()). In this section, you will
                         learn a newer way to define data cubes called multifeature cubes. Multifeature cubes
                         enable more in-depth analysis. They can compute more complex queries of which the
                         measures depend on groupings of multiple aggregates at varying granularity levels. The
                         queries posed can be much more elaborate and task-specific than traditional queries,
                         as we shall illustrate in the next examples. Many complex data mining queries can be
                         answered by multifeature cubes without significant increase in computational cost, in
                         comparison to cube computation for simple queries with traditional data cubes.
                           To illustrate the idea of multifeature cubes, let’s first look at an example of a query on
                         a simple data cube.

           Example 5.19 A simple data cube query. Let the query be “Find the total sales in 2010, broken down
                         by item, region, and month, with subtotals for each dimension.” To answer this query, a
                         traditional data cube is constructed that aggregates the total sales at the following eight
                         different granularity levels: {(item, region, month), (item, region), (item, month), (month,
                         region), (item), (month), (region), ()}, where () represents all. This data cube is simple in
                         that it does not involve any dependent aggregates.

                           To illustrate what is meant by “dependent aggregates,” let’s examine a more complex
                         query, which can be computed with a multifeature cube.

           Example 5.20 A complex query involving dependent aggregates. Suppose the query is “Grouping by
                         all subsets of {item, region, month}, find the maximum price in 2010 for each group and the
                         total sales among all maximum price tuples.”
                           The specification of such a query using standard SQL can be long, repetitive, and
                         difficult to optimize and maintain. Alternatively, it can be specified concisely using an
                         extended SQL syntax as follows:

                                  select  item, region, month, max(price), sum(R.sales)
                                  from    Purchases
                                  where   year = 2010
                                  cube by  item, region, month: R
                                  such that R.price = max(price)
                         The tuples representing purchases in 2010 are first selected. The cube by clause com-
                         putes aggregates (or group-by’s) for all possible combinations of the attributes item,
   262   263   264   265   266   267   268   269   270   271   272