Page 296 -
P. 296

HAN
                                                            2011/6/1
                                                                     3:20 Page 259
                               13-ch06-243-278-9780123814791
                                                                                    #17
                                                               6.2 Frequent Itemset Mining Methods  259


                     Table 6.2 Mining the FP-Tree by Creating Conditional (Sub-)Pattern Bases
                               Item  Conditional Pattern Base  Conditional FP-tree  Frequent Patterns Generated
                               I5   {{I2, I1: 1}, {I2, I1, I3: 1}}  hI2: 2, I1: 2i  {I2, I5: 2}, {I1, I5: 2}, {I2, I1, I5: 2}
                               I4   {{I2, I1: 1}, {I2: 1}}  hI2: 2i       {I2, I4: 2}
                               I3   {{I2, I1: 2}, {I2: 2}, {I1: 2}}  hI2: 4, I1: 2i, hI1: 2i  {I2, I3: 4}, {I1, I3: 4}, {I2, I1, I3: 2}
                               I1   {{I2: 4}}             hI2: 4i         {I2, I1: 4}



                                     Support
                                      count                 null{}
                               Item ID     Node-link
                                    I2  4           I2:4          I1:2
                                    I1  4
                                                    I1:2


                     Figure 6.8 The conditional FP-tree associated with the conditional node I3.

                                 Similar to the preceding analysis, I3’s conditional pattern base is {{I2, I1: 2}, {I2: 2},
                               {I1: 2}}. Its conditional FP-tree has two branches, hI2: 4, I1: 2i and hI1: 2i, as shown
                               in Figure 6.8, which generates the set of patterns {{I2, I3: 4}, {I1, I3: 4}, {I2, I1, I3: 2}}.
                               Finally, I1’s conditional pattern base is {{I2: 4}}, with an FP-tree that contains only one
                               node, hI2: 4i, which generates one frequent pattern, {I2, I1: 4}. This mining process is
                               summarized in Figure 6.9.

                                 The FP-growth method transforms the problem of finding long frequent patterns
                               into searching for shorter ones in much smaller conditional databases recursively and
                               then concatenating the suffix. It uses the least frequent items as a suffix, offering good
                               selectivity. The method substantially reduces the search costs.
                                 When the database is large, it is sometimes unrealistic to construct a main memory-
                               based FP-tree. An interesting alternative is to first partition the database into a set
                               of projected databases, and then construct an FP-tree and mine it in each projected
                               database. This process can be recursively applied to any projected database if its FP-tree
                               still cannot fit in main memory.
                                 A study of the FP-growth method performance shows that it is efficient and scalable
                               for mining both long and short frequent patterns, and is about an order of magnitude
                               faster than the Apriori algorithm.

                         6.2.5 Mining Frequent Itemsets Using the Vertical Data Format
                               Both the Apriori and FP-growth methods mine frequent patterns from a set of trans-
                               actions in TID-itemset format (i.e., {TID : itemset}), where TID is a transaction ID
                               and itemset is the set of items bought in transaction TID. This is known as the
                               horizontal data format. Alternatively, data can be presented in item-TID set format
   291   292   293   294   295   296   297   298   299   300   301