Page 239 -
P. 239
12-ch05-187-242-9780123814791
HAN
202 Chapter 5 Data Cube Technology 2011/6/1 3:19 Page 202 #16
Algorithm: BUC. Algorithm for the computation of sparse and iceberg cubes.
Input:
input: the relation to aggregate;
dim: the starting dimension for this iteration.
Globals:
constant numDims: the total number of dimensions;
constant cardinality[numDims]: the cardinality of each dimension;
constant min sup: the minimum number of tuples in a partition for it to be output;
outputRec: the current output record;
dataCount[numDims]: stores the size of each partition. dataCount[i] is a list of integers
of size cardinality[i].
Output: Recursively output the iceberg cube cells satisfying the minimum support.
Method:
(1) Aggregate(input); // Scan input to compute measure, e.g., count. Place result in outputRec.
(2) if input.count() == 1 then // Optimization
WriteDescendants(input[0], dim); return;
endif
(3) write outputRec;
(4) for (d = dim; d < numDims; d + +) do //Partition each dimension
(5) C = cardinality[d];
(6) Partition(input, d, C, dataCount[d]); //create C partitions of data for dimension d
(7) k = 0;
(8) for (i = 0;i < C;i + +) do // for each partition (each value of dimension d)
(9) c = dataCount[d][i];
(10) if c >= min sup then // test the iceberg condition
(11) outputRec.dim[d] = input[k].dim[d];
(12) BUC(input[k..k + c − 1], d + 1); // aggregate on next dimension
(13) endif
(14) k +=c;
(15) endfor
(16) outputRec.dim[d] = all;
(17) endfor
Figure 5.6 BUC algorithm for sparse or iceberg cube computation. Source: Beyer and Ramakrishnan
[BR99].
Suppose (a 1 , ∗ , ∗ , ∗) satisfies the minimum support, in which case a recursive call is
made on the partition for a 1 . BUC partitions a 1 on the dimension B. It checks the count
of (a 1 , b 1 , ∗ , ∗) to see if it satisfies the minimum support. If it does, it outputs the aggre-
gated tuple to the AB group-by and recurses on (a 1 , b 1 , ∗ , ∗) to partition on C, starting