Page 297 -

P. 297

13-ch06-243-278-9780123814791
2011/6/1
HAN
260 Chapter 6 Mining Frequent Patterns, Associations, and Correlations 3:20 Page 260 #18

Algorithm: FP growth. Mine frequent itemsets using an FP-tree by pattern fragment growth.
Input:
D, a transaction database;
min sup, the minimum support count threshold.
Output: The complete set of frequent patterns.
Method:

1. The FP-tree is constructed in the following steps:
(a) Scan the transaction database D once. Collect F, the set of frequent items, and their
support counts. Sort F in support count descending order as L, the list of frequent items.
(b) Create the root of an FP-tree, and label it as “null.” For each transaction Trans in D do the
following.
Select and sort the frequent items in Trans according to the order of L. Let the sorted
frequent item list in Trans be [p|P], where p is the ﬁrst element and P is the remaining
list. Call insert tree([p|P], T), which is performed as follows. If T has a child N such that
N.item-name = p.item-name, then increment N’s count by 1; else create a new node N,
and let its count be 1, its parent link be linked to T, and its node-link to the nodes with
the same item-name via the node-link structure. If P is nonempty, call insert tree(P, N)
recursively.
2. The FP-tree is mined by calling FP growth(FP tree, null), which is implemented as follows.
procedure FP growth(Tree, α)
(1) if Tree contains a single path P then
(2) for each combination (denoted as β) of the nodes in the path P
(3) generate pattern β ∪ α with support count = minimum support count of nodes in β;
(4) else for each a i in the header of Tree {
(5) generate pattern β = a i ∪ α with support count = a i .support count;
(6) construct β’s conditional pattern base and then β’s conditional FP tree Tree β ;
(7) if Tree β 6= ∅ then
(8) call FP growth(Tree β , β); }

Figure 6.9 FP-growth algorithm for discovering frequent itemsets without candidate generation.

(i.e., {item : TID set}), where item is an item name, and TID set is the set of transaction
identiﬁers containing the item. This is known as the vertical data format.
In this subsection, we look at how frequent itemsets can also be mined efﬁ-
ciently using vertical data format, which is the essence of the Eclat (Equivalence Class
Transformation) algorithm.
Example 6.6 Mining frequent itemsets using the vertical data format. Consider the horizontal
data format of the transaction database, D, of Table 6.1 in Example 6.3. This can be
transformed into the vertical data format shown in Table 6.3 by scanning the data
set once.
Mining can be performed on this data set by intersecting the TID sets of every pair
of frequent single items. The minimum support count is 2. Because every single item is

292 293 294 295 296 297 298 299 300 301 302