Page 295 -

P. 295

HAN
2011/6/1
13-ch06-243-278-9780123814791
258 Chapter 6 Mining Frequent Patterns, Associations, and Correlations 3:20 Page 258 #16

null{}
Support
count I2:7
Item ID Node-link I1:2
I2 7 I1:4 I4:1
I1 6 I3:2 I3:2
I3 6
I4 2 I5:1
I5 2
I4:1
I3:2

I5:1

Figure 6.7 An FP-tree registers compressed, frequent pattern information.

when considering the branch to be added for a transaction, the count of each node along
a common preﬁx is incremented by 1, and nodes for the items following the preﬁx are
created and linked accordingly.
To facilitate tree traversal, an item header table is built so that each item points to its
occurrences in the tree via a chain of node-links. The tree obtained after scanning all
the transactions is shown in Figure 6.7 with the associated node-links. In this way, the
problem of mining frequent patterns in databases is transformed into that of mining the
FP-tree.
The FP-tree is mined as follows. Start from each frequent length-1 pattern (as an
initial sufﬁx pattern), construct its conditional pattern base (a “sub-database,” which
consists of the set of preﬁx paths in the FP-tree co-occurring with the sufﬁx pattern),
then construct its (conditional) FP-tree, and perform mining recursively on the tree. The
pattern growth is achieved by the concatenation of the sufﬁx pattern with the frequent
patterns generated from a conditional FP-tree.
Mining of the FP-tree is summarized in Table 6.2 and detailed as follows. We ﬁrst
consider I5, which is the last item in L, rather than the ﬁrst. The reason for starting at
the end of the list will become apparent as we explain the FP-tree mining process. I5
occurs in two FP-tree branches of Figure 6.7. (The occurrences of I5 can easily be found
by following its chain of node-links.) The paths formed by these branches are hI2, I1,
I5: 1i and hI2, I1, I3, I5: 1i. Therefore, considering I5 as a sufﬁx, its corresponding two
preﬁx paths are hI2, I1: 1i and hI2, I1, I3: 1i, which form its conditional pattern base.
Using this conditional pattern base as a transaction database, we build an I5-conditional
FP-tree, which contains only a single path, hI2: 2, I1: 2i; I3 is not included because its
support count of 1 is less than the minimum support count. The single path generates
all the combinations of frequent patterns: {I2, I5: 2}, {I1, I5: 2}, {I2, I1, I5: 2}.
For I4, its two preﬁx paths form the conditional pattern base, {{I2 I1: 1}, {I2: 1}},
which generates a single-node conditional FP-tree, hI2: 2i, and derives one frequent
pattern, {I2, I4: 2}.

290 291 292 293 294 295 296 297 298 299 300