Page 295 -
P. 295

HAN
                                                            2011/6/1
                               13-ch06-243-278-9780123814791
          258   Chapter 6 Mining Frequent Patterns, Associations, and Correlations  3:20 Page 258  #16



                                                                 null{}
                               Support
                                count             I2:7
                         Item ID     Node-link                           I1:2
                              I2  7         I1:4             I4:1
                              I1  6                    I3:2        I3:2
                              I3  6
                              I4  2      I5:1
                              I5  2
                                                        I4:1
                                              I3:2

                                                    I5:1

               Figure 6.7 An FP-tree registers compressed, frequent pattern information.




                         when considering the branch to be added for a transaction, the count of each node along
                         a common prefix is incremented by 1, and nodes for the items following the prefix are
                         created and linked accordingly.
                           To facilitate tree traversal, an item header table is built so that each item points to its
                         occurrences in the tree via a chain of node-links. The tree obtained after scanning all
                         the transactions is shown in Figure 6.7 with the associated node-links. In this way, the
                         problem of mining frequent patterns in databases is transformed into that of mining the
                         FP-tree.
                           The FP-tree is mined as follows. Start from each frequent length-1 pattern (as an
                         initial suffix pattern), construct its conditional pattern base (a “sub-database,” which
                         consists of the set of prefix paths in the FP-tree co-occurring with the suffix pattern),
                         then construct its (conditional) FP-tree, and perform mining recursively on the tree. The
                         pattern growth is achieved by the concatenation of the suffix pattern with the frequent
                         patterns generated from a conditional FP-tree.
                           Mining of the FP-tree is summarized in Table 6.2 and detailed as follows. We first
                         consider I5, which is the last item in L, rather than the first. The reason for starting at
                         the end of the list will become apparent as we explain the FP-tree mining process. I5
                         occurs in two FP-tree branches of Figure 6.7. (The occurrences of I5 can easily be found
                         by following its chain of node-links.) The paths formed by these branches are hI2, I1,
                         I5: 1i and hI2, I1, I3, I5: 1i. Therefore, considering I5 as a suffix, its corresponding two
                         prefix paths are hI2, I1: 1i and hI2, I1, I3: 1i, which form its conditional pattern base.
                         Using this conditional pattern base as a transaction database, we build an I5-conditional
                         FP-tree, which contains only a single path, hI2: 2, I1: 2i; I3 is not included because its
                         support count of 1 is less than the minimum support count. The single path generates
                         all the combinations of frequent patterns: {I2, I5: 2}, {I1, I5: 2}, {I2, I1, I5: 2}.
                           For I4, its two prefix paths form the conditional pattern base, {{I2 I1: 1}, {I2: 1}},
                         which generates a single-node conditional FP-tree, hI2: 2i, and derives one frequent
                         pattern, {I2, I4: 2}.
   290   291   292   293   294   295   296   297   298   299   300