Page 297 -
P. 297

13-ch06-243-278-9780123814791
                                                            2011/6/1
                         HAN
          260   Chapter 6 Mining Frequent Patterns, Associations, and Correlations  3:20 Page 260  #18



                         Algorithm: FP growth. Mine frequent itemsets using an FP-tree by pattern fragment growth.
                         Input:
                             D, a transaction database;
                             min sup, the minimum support count threshold.
                         Output: The complete set of frequent patterns.
                         Method:

                         1. The FP-tree is constructed in the following steps:
                           (a) Scan the transaction database D once. Collect F, the set of frequent items, and their
                              support counts. Sort F in support count descending order as L, the list of frequent items.
                           (b) Create the root of an FP-tree, and label it as “null.” For each transaction Trans in D do the
                              following.
                              Select and sort the frequent items in Trans according to the order of L. Let the sorted
                              frequent item list in Trans be [p|P], where p is the first element and P is the remaining
                              list. Call insert tree([p|P], T), which is performed as follows. If T has a child N such that
                              N.item-name = p.item-name, then increment N’s count by 1; else create a new node N,
                              and let its count be 1, its parent link be linked to T, and its node-link to the nodes with
                              the same item-name via the node-link structure. If P is nonempty, call insert tree(P, N)
                              recursively.
                         2. The FP-tree is mined by calling FP growth(FP tree, null), which is implemented as follows.
                         procedure FP growth(Tree, α)
                         (1)  if Tree contains a single path P then
                         (2)    for each combination (denoted as β) of the nodes in the path P
                         (3)      generate pattern β ∪ α with support count = minimum support count of nodes in β;
                         (4)  else for each a i in the header of Tree {
                         (5)    generate pattern β = a i ∪ α with support count = a i .support count;
                         (6)    construct β’s conditional pattern base and then β’s conditional FP tree Tree β ;
                         (7)    if Tree β 6= ∅ then
                         (8)      call FP growth(Tree β , β); }

               Figure 6.9 FP-growth algorithm for discovering frequent itemsets without candidate generation.


                         (i.e., {item : TID set}), where item is an item name, and TID set is the set of transaction
                         identifiers containing the item. This is known as the vertical data format.
                           In this subsection, we look at how frequent itemsets can also be mined effi-
                         ciently using vertical data format, which is the essence of the Eclat (Equivalence Class
                         Transformation) algorithm.
            Example 6.6 Mining frequent itemsets using the vertical data format. Consider the horizontal
                         data format of the transaction database, D, of Table 6.1 in Example 6.3. This can be
                         transformed into the vertical data format shown in Table 6.3 by scanning the data
                         set once.
                           Mining can be performed on this data set by intersecting the TID sets of every pair
                         of frequent single items. The minimum support count is 2. Because every single item is
   292   293   294   295   296   297   298   299   300   301   302