Page 239 -
P. 239

12-ch05-187-242-9780123814791
                         HAN

          202   Chapter 5 Data Cube Technology              2011/6/1  3:19 Page 202  #16



                         Algorithm: BUC. Algorithm for the computation of sparse and iceberg cubes.
                         Input:
                              input: the relation to aggregate;
                              dim: the starting dimension for this iteration.
                         Globals:

                              constant numDims: the total number of dimensions;
                              constant cardinality[numDims]: the cardinality of each dimension;
                              constant min sup: the minimum number of tuples in a partition for it to be output;
                              outputRec: the current output record;
                              dataCount[numDims]: stores the size of each partition. dataCount[i] is a list of integers
                              of size cardinality[i].
                         Output: Recursively output the iceberg cube cells satisfying the minimum support.
                         Method:
                         (1) Aggregate(input); // Scan input to compute measure, e.g., count. Place result in outputRec.
                         (2) if input.count() == 1 then // Optimization
                               WriteDescendants(input[0], dim); return;
                            endif
                         (3) write outputRec;
                         (4) for (d = dim; d < numDims; d + +) do //Partition each dimension
                         (5)   C = cardinality[d];
                         (6)   Partition(input, d, C, dataCount[d]); //create C partitions of data for dimension d
                         (7)   k = 0;
                         (8)   for (i = 0;i < C;i + +) do // for each partition (each value of dimension d)
                         (9)        c = dataCount[d][i];
                         (10)       if c >= min sup then // test the iceberg condition
                         (11)            outputRec.dim[d] = input[k].dim[d];
                         (12)            BUC(input[k..k + c − 1], d + 1); // aggregate on next dimension
                         (13)       endif
                         (14)       k +=c;
                         (15)  endfor
                         (16)  outputRec.dim[d] = all;
                         (17) endfor


               Figure 5.6 BUC algorithm for sparse or iceberg cube computation. Source: Beyer and Ramakrishnan
                         [BR99].


                         Suppose (a 1 , ∗ , ∗ , ∗) satisfies the minimum support, in which case a recursive call is
                         made on the partition for a 1 . BUC partitions a 1 on the dimension B. It checks the count
                         of (a 1 , b 1 , ∗ , ∗) to see if it satisfies the minimum support. If it does, it outputs the aggre-
                         gated tuple to the AB group-by and recurses on (a 1 , b 1 , ∗ , ∗) to partition on C, starting
   234   235   236   237   238   239   240   241   242   243   244