Page 230 -
P. 230

Page 193
                                                                                    #7
                                                             2011/6/1
                                                                     3:19
                          HAN 12-ch05-187-242-9780123814791
                                                    5.1 Data Cube Computation: Preliminary Concepts  193


                               data representations. The following are general optimization techniques for efficient
                               computation of data cubes.

                               Optimization Technique 1: Sorting, hashing, and grouping. Sorting, hashing, and
                                 grouping operations should be applied to the dimension attributes to reorder and
                                 cluster related tuples.
                                    In cube computation, aggregation is performed on the tuples (or cells) that share
                                 the same set of dimension values. Thus, it is important to explore sorting, hashing,
                                 and grouping operations to access and group such data together to facilitate compu-
                                 tation of such aggregates.
                                    To compute total sales by branch, day, and item, for example, it can be more
                                 efficient to sort tuples or cells by branch, and then by day, and then group them
                                 according to the item name. Efficient implementations of such operations in large
                                 data sets have been extensively studied in the database research community. Such
                                 implementations can be extended to data cube computation.
                                    This technique can also be further extended to perform shared-sorts (i.e., sharing
                                 sorting costs across multiple cuboids when sort-based methods are used), or to per-
                                 form shared-partitions (i.e., sharing the partitioning cost across multiple cuboids
                                 when hash-based algorithms are used).
                               Optimization Technique 2: Simultaneous aggregation and caching of intermediate
                                 results. In cube computation, it is efficient to compute higher-level aggregates from
                                 previously computed lower-level aggregates, rather than from the base fact table.
                                 Moreover, simultaneous aggregation from cached intermediate computation results
                                 may lead to the reduction of expensive disk input/output (I/O) operations.
                                    To compute sales by branch, for example, we can use the intermediate results
                                 derived from the computation of a lower-level cuboid such as sales by branch and day.
                                 This technique can be further extended to perform amortized scans (i.e., computing
                                 as many cuboids as possible at the same time to amortize disk reads).

                               Optimization Technique 3: Aggregation from the smallest child when there exist mul-
                                 tiple child cuboids. When there exist multiple child cuboids, it is usually more
                                 efficient to compute the desired parent (i.e., more generalized) cuboid from the
                                 smallest, previously computed child cuboid.
                                    To compute a sales cuboid, C branch , when there exist two previously computed
                                 cuboids, C {branch,year} and C {branch,item} , for example, it is obviously more efficient to
                                 compute C branch from the former than from the latter if there are many more distinct
                                 items than distinct years.

                               Many other optimization techniques may further improve computational efficiency. For
                               example, string dimension attributes can be mapped to integers with values ranging
                               from zero to the cardinality of the attribute.
                                 In iceberg cube computation the following optimization technique plays a particu-
                               larly important role.
   225   226   227   228   229   230   231   232   233   234   235