Page 231 -
P. 231

HAN 12-ch05-187-242-9780123814791


          194   Chapter 5 Data Cube Technology               2011/6/1  3:19  Page 194  #8



                           Optimization Technique 4: The Apriori pruning method can be explored to
                                                                           3
                           compute iceberg cubes efficiently. The Apriori property, in the context of data
                           cubes, states as follows: If a given cell does not satisfy minimum support, then no descen-
                           dant of the cell (i.e., more specialized cell) will satisfy minimum support either. This
                           property can be used to substantially reduce the computation of iceberg cubes.
                              Recall that the specification of iceberg cubes contains an iceberg condition, which
                           is a constraint on the cells to be materialized. A common iceberg condition is that the
                           cells must satisfy a minimum support threshold such as a minimum count or sum. In
                           this situation, the Apriori property can be used to prune away the exploration of the
                           cell’s descendants. For example, if the count of a cell, c, in a cuboid is less than a
                           minimum support threshold, v, then the count of any of c’s descendant cells in the
                           lower-level cuboids can never be greater than or equal to v, and thus can be pruned.
                              In other words, if a condition (e.g., the iceberg condition specified in the having
                           clause) is violated for some cell c, then every descendant of c will also violate that con-
                                                                                    4
                           dition. Measures that obey this property are known as antimonotonic. This form
                           of pruning was made popular in frequent pattern mining, yet also aids in data cube
                           computation by cutting processing time and disk space requirements. It can lead to a
                           more focused analysis because cells that cannot pass the threshold are unlikely to be
                           of interest.

                           In the following sections, we introduce several popular methods for efficient cube
                         computation that explore these optimization strategies.


                 5.2     Data Cube Computation Methods


                         Data cube computation is an essential task in data warehouse implementation. The pre-
                         computation of all or part of a data cube can greatly reduce the response time and
                         enhance the performance of online analytical processing. However, such computation
                         is challenging because it may require substantial computational time and storage
                         space. This section explores efficient methods for data cube computation. Section 5.2.1
                         describes the multiway array aggregation (MultiWay) method for computing full cubes.
                         Section 5.2.2 describes a method known as BUC, which computes iceberg cubes from
                         the apex cuboid downward. Section 5.2.3 describes the Star-Cubing method, which
                         integrates top-down and bottom-up computation.
                           Finally, Section 5.2.4 describes a shell-fragment cubing approach that computes shell
                         fragments for efficient high-dimensional OLAP. To simplify our discussion, we exclude


                         3 The Apriori property was proposed in the Apriori algorithm for association rule mining by Agrawal
                         and Srikant [AS94b]. Many algorithms in association rule mining have adopted this property (see
                         Chapter 6).
                         4 Antimonotone is based on condition violation. This differs from monotone, which is based on
                         condition satisfaction.
   226   227   228   229   230   231   232   233   234   235   236