Page 311 -
P. 311
13-ch06-243-278-9780123814791
2011/6/1
HAN
274 Chapter 6 Mining Frequent Patterns, Associations, and Correlations 3:20 Page 274 #32
TID items bought
T100 {M, O, N, K, E, Y}
T200 {D, O, N, K, E, Y }
T300 {M, A, K, E}
T400 {M, U, C, K, Y}
T500 {C, O, O, K, I, E}
(a) Find all frequent itemsets using Apriori and FP-growth, respectively. Compare
the efficiency of the two mining processes.
(b) List all the strong association rules (with support s and confidence c) matching
the following metarule, where X is a variable representing customers, and item i
denotes variables representing items (e.g., “A,” “B,”):
∀x ∈ transaction, buys(X,item 1 ) ∧ buys(X,item 2 ) ⇒ buys(X,item 3 ) [s,c]
6.7 (Implementation project) Using a programming language that you are familiar
with, such as C++ or Java, implement three frequent itemset mining algorithms
introduced in this chapter: (1) Apriori [AS94b], (2) FP-growth [HPY00], and
(3) Eclat [Zak00] (mining using the vertical data format). Compare the perfor-
mance of each algorithm with various kinds of large data sets. Write a report to
analyze the situations (e.g., data size, data distribution, minimal support thresh-
old setting, and pattern density) where one algorithm may perform better than the
others, and state why.
6.8 A database has four transactions. Let min sup = 60% and min conf = 80%.
cust ID TID items bought (in the form of brand-item category)
01 T100 {King’s-Crab, Sunset-Milk, Dairyland-Cheese, Best-Bread}
02 T200 {Best-Cheese, Dairyland-Milk, Goldenfarm-Apple, Tasty-Pie, Wonder-Bread}
01 T300 {Westcoast-Apple, Dairyland-Milk, Wonder-Bread, Tasty-Pie}
03 T400 {Wonder-Bread, Sunset-Milk, Dairyland-Cheese}
(a) At the granularity of item category (e.g., item i could be “Milk”), for the rule
template,
∀X ∈ transaction, buys(X,item 1 ) ∧ buys(X,item 2 ) ⇒ buys(X,item 3 ) [s,c],
list the frequent k-itemset for the largest k, and all the strong association rules
(with their support s and confidence c) containing the frequent k-itemset for the
largest k.
(b) At the granularity of brand-item category (e.g., item i could be “Sunset-Milk”),
for the rule template,
∀X ∈ customer, buys(X,item 1 ) ∧ buys(X,item 2 ) ⇒ buys(X,item 3 ),
list the frequent k-itemset for the largest k (but do not print any rules).