Page 296 -

P. 296

HAN
2011/6/1
3:20 Page 259
13-ch06-243-278-9780123814791
#17
6.2 Frequent Itemset Mining Methods 259

Table 6.2 Mining the FP-Tree by Creating Conditional (Sub-)Pattern Bases
Item Conditional Pattern Base Conditional FP-tree Frequent Patterns Generated
I5 {{I2, I1: 1}, {I2, I1, I3: 1}} hI2: 2, I1: 2i {I2, I5: 2}, {I1, I5: 2}, {I2, I1, I5: 2}
I4 {{I2, I1: 1}, {I2: 1}} hI2: 2i {I2, I4: 2}
I3 {{I2, I1: 2}, {I2: 2}, {I1: 2}} hI2: 4, I1: 2i, hI1: 2i {I2, I3: 4}, {I1, I3: 4}, {I2, I1, I3: 2}
I1 {{I2: 4}} hI2: 4i {I2, I1: 4}

Support
count null{}
Item ID Node-link
I2 4 I2:4 I1:2
I1 4
I1:2

Figure 6.8 The conditional FP-tree associated with the conditional node I3.

Similar to the preceding analysis, I3’s conditional pattern base is {{I2, I1: 2}, {I2: 2},
{I1: 2}}. Its conditional FP-tree has two branches, hI2: 4, I1: 2i and hI1: 2i, as shown
in Figure 6.8, which generates the set of patterns {{I2, I3: 4}, {I1, I3: 4}, {I2, I1, I3: 2}}.
Finally, I1’s conditional pattern base is {{I2: 4}}, with an FP-tree that contains only one
node, hI2: 4i, which generates one frequent pattern, {I2, I1: 4}. This mining process is
summarized in Figure 6.9.

The FP-growth method transforms the problem of ﬁnding long frequent patterns
into searching for shorter ones in much smaller conditional databases recursively and
then concatenating the sufﬁx. It uses the least frequent items as a sufﬁx, offering good
selectivity. The method substantially reduces the search costs.
When the database is large, it is sometimes unrealistic to construct a main memory-
based FP-tree. An interesting alternative is to ﬁrst partition the database into a set
of projected databases, and then construct an FP-tree and mine it in each projected
database. This process can be recursively applied to any projected database if its FP-tree
still cannot ﬁt in main memory.
A study of the FP-growth method performance shows that it is efﬁcient and scalable
for mining both long and short frequent patterns, and is about an order of magnitude
faster than the Apriori algorithm.

6.2.5 Mining Frequent Itemsets Using the Vertical Data Format
Both the Apriori and FP-growth methods mine frequent patterns from a set of trans-
actions in TID-itemset format (i.e., {TID : itemset}), where TID is a transaction ID
and itemset is the set of items bought in transaction TID. This is known as the
horizontal data format. Alternatively, data can be presented in item-TID set format

291 292 293 294 295 296 297 298 299 300 301