Page 232 -
P. 232
#9
Page 195
3:19
2011/6/1
HAN 12-ch05-187-242-9780123814791
5.2 Data Cube Computation Methods 195
the cuboids that would be generated by climbing up any existing hierarchies for the
dimensions. Those cube types can be computed by extension of the discussed methods.
Methods for the efficient computation of closed cubes are left as an exercise for interested
readers.
5.2.1 Multiway Array Aggregation for Full
Cube Computation
The multiway array aggregation (or simply MultiWay) method computes a full data
cube by using a multidimensional array as its basic data structure. It is a typical MOLAP
approach that uses direct array addressing, where dimension values are accessed via the
position or index of their corresponding array locations. Hence, MultiWay cannot per-
form any value-based reordering as an optimization technique. A different approach is
developed for the array-based cube construction, as follows:
1. Partition the array into chunks. A chunk is a subcube that is small enough to fit into
the memory available for cube computation. Chunking is a method for dividing an
n-dimensional array into small n-dimensional chunks, where each chunk is stored as
an object on disk. The chunks are compressed so as to remove wasted space resulting
from empty array cells. A cell is empty if it does not contain any valid data (i.e., its
cell count is 0). For instance, “chunkID + offset” can be used as a cell-addressing
mechanism to compress a sparse array structure and when searching for cells within
a chunk. Such a compression technique is powerful at handling sparse cubes, both on
disk and in memory.
2. Compute aggregates by visiting (i.e., accessing the values at) cube cells. The order in
which cells are visited can be optimized so as to minimize the number of times that
each cell must be revisited, thereby reducing memory access and storage costs. The
trick is to exploit this ordering so that portions of the aggregate cells in multiple
cuboids can be computed simultaneously, and any unnecessary revisiting of cells is
avoided.
This chunking technique involves “overlapping” some of the aggregation computations;
therefore, it is referred to as multiway array aggregation. It performs simultaneous
aggregation, that is, it computes aggregations simultaneously on multiple dimensions.
We explain this approach to array-based cube construction by looking at a concrete
example.
Example 5.4 Multiway array cube computation. Consider a 3-D data array containing the three
dimensions A, B, and C. The 3-D array is partitioned into small, memory-based chunks.
In this example, the array is partitioned into 64 chunks as shown in Figure 5.3. Dimen-
sion A is organized into four equal-sized partitions: a 0 , a 1 , a 2 , and a 3 . Dimensions B
and C are similarly organized into four partitions each. Chunks 1, 2, ..., 64 correspond
to the subcubes a 0 b 0 c 0 , a 1 b 0 c 0 , ..., a 3 b 3 c 3 , respectively. Suppose that the cardinality of