Page 232 -
P. 232

#9
                                                                           Page 195
                                                                     3:19
                                                             2011/6/1
                          HAN 12-ch05-187-242-9780123814791
                                                               5.2 Data Cube Computation Methods  195


                               the cuboids that would be generated by climbing up any existing hierarchies for the
                               dimensions. Those cube types can be computed by extension of the discussed methods.
                               Methods for the efficient computation of closed cubes are left as an exercise for interested
                               readers.


                         5.2.1 Multiway Array Aggregation for Full
                               Cube Computation
                               The multiway array aggregation (or simply MultiWay) method computes a full data
                               cube by using a multidimensional array as its basic data structure. It is a typical MOLAP
                               approach that uses direct array addressing, where dimension values are accessed via the
                               position or index of their corresponding array locations. Hence, MultiWay cannot per-
                               form any value-based reordering as an optimization technique. A different approach is
                               developed for the array-based cube construction, as follows:
                               1. Partition the array into chunks. A chunk is a subcube that is small enough to fit into
                                 the memory available for cube computation. Chunking is a method for dividing an
                                 n-dimensional array into small n-dimensional chunks, where each chunk is stored as
                                 an object on disk. The chunks are compressed so as to remove wasted space resulting
                                 from empty array cells. A cell is empty if it does not contain any valid data (i.e., its
                                 cell count is 0). For instance, “chunkID + offset” can be used as a cell-addressing
                                 mechanism to compress a sparse array structure and when searching for cells within
                                 a chunk. Such a compression technique is powerful at handling sparse cubes, both on
                                 disk and in memory.

                               2. Compute aggregates by visiting (i.e., accessing the values at) cube cells. The order in
                                 which cells are visited can be optimized so as to minimize the number of times that
                                 each cell must be revisited, thereby reducing memory access and storage costs. The
                                 trick is to exploit this ordering so that portions of the aggregate cells in multiple
                                 cuboids can be computed simultaneously, and any unnecessary revisiting of cells is
                                 avoided.

                               This chunking technique involves “overlapping” some of the aggregation computations;
                               therefore, it is referred to as multiway array aggregation. It performs simultaneous
                               aggregation, that is, it computes aggregations simultaneously on multiple dimensions.
                                 We explain this approach to array-based cube construction by looking at a concrete
                               example.

                  Example 5.4 Multiway array cube computation. Consider a 3-D data array containing the three
                               dimensions A, B, and C. The 3-D array is partitioned into small, memory-based chunks.
                               In this example, the array is partitioned into 64 chunks as shown in Figure 5.3. Dimen-
                               sion A is organized into four equal-sized partitions: a 0 , a 1 , a 2 , and a 3 . Dimensions B
                               and C are similarly organized into four partitions each. Chunks 1, 2, ..., 64 correspond
                               to the subcubes a 0 b 0 c 0 , a 1 b 0 c 0 , ..., a 3 b 3 c 3 , respectively. Suppose that the cardinality of
   227   228   229   230   231   232   233   234   235   236   237