Page 240 -

P. 240

2011/6/1
12-ch05-187-242-9780123814791
#17
3:19 Page 203
HAN
5.2 Data Cube Computation Methods 203

d 1 d 2

a 1 b 1 c 1

c 2

b 2
b 3

b 4

a 2

a 3

a 4

Figure 5.7 BUC partitioning snapshot given an example 4-D data set.

with c 1 . Suppose the cell count for (a 1 , b 1 , c 1 , ∗) is 2, which does not satisfy the mini-
mum support. According to the Apriori property, if a cell does not satisfy the minimum
support, then neither can any of its descendants. Therefore, BUC prunes any further
exploration of (a 1 , b 1 , c 1 , ∗). That is, it avoids partitioning this cell on dimension D. It
backtracks to the a 1 , b 1 partition and recurses on (a 1 , b 1 , c 2 , ∗), and so on. By checking
the iceberg condition each time before performing a recursive call, BUC saves a great
deal of processing time whenever a cell’s count does not satisfy the minimum support.
The partition process is facilitated by a linear sorting method, CountingSort. Count-
ingSort is fast because it does not perform any key comparisons to ﬁnd partition
boundaries. In addition, the counts computed during the sort can be reused to com-
pute the group-by’s in BUC. Line 2 is an optimization for partitions having a count of 1
such as (a 1 , b 2 , ∗ , ∗) in our example. To save on partitioning costs, the count is written

235 236 237 238 239 240 241 242 243 244 245