Page 309 -

P. 309

2011/6/1
13-ch06-243-278-9780123814791
HAN
272 Chapter 6 Mining Frequent Patterns, Associations, and Correlations 3:20 Page 272 #30

customers’ buying habits by searching for itemsets that are frequently purchased
together (or in sequence).
Association rule mining consists of ﬁrst ﬁnding frequent itemsets (sets of items,
such as A and B, satisfying a minimum support threshold, or percentage of the task-
relevant tuples), from which strong association rules in the form of A ⇒ B are
generated. These rules also satisfy a minimum conﬁdence threshold (a prespeciﬁed
probability of satisfying B under the condition that A is satisﬁed). Associations can be
further analyzed to uncover correlation rules, which convey statistical correlations
between itemsets A and B.
Many efﬁcient and scalable algorithms have been developed for frequent itemset
mining, from which association and correlation rules can be derived. These algo-
rithms can be classiﬁed into three categories: (1) Apriori-like algorithms, (2) frequent
pattern growth–based algorithms such as FP-growth, and (3) algorithms that use the
vertical data format.
The Apriori algorithm is a seminal algorithm for mining frequent itemsets for
Boolean association rules. It explores the level-wise mining Apriori property that all
nonempty subsets of a frequent itemset must also be frequent. At the kth iteration (for
k ≥ 2), it forms frequent k-itemset candidates based on the frequent (k − 1)-itemsets,
and scans the database once to ﬁnd the complete set of frequent k-itemsets, L k .
Variations involving hashing and transaction reduction can be used to make the
procedure more efﬁcient. Other variations include partitioning the data (mining on
each partition and then combining the results) and sampling the data (mining on
a data subset). These variations can reduce the number of data scans required to as
little as two or even one.
Frequent pattern growth is a method of mining frequent itemsets without candidate
generation. It constructs a highly compact data structure (an FP-tree) to compress the
originaltransactiondatabase.Ratherthanemployingthegenerate-and-teststrategyof
Apriori-like methods, it focuses on frequent pattern (fragment) growth, which avoids
costly candidate generation, resulting in greater efﬁciency.
Mining frequent itemsets using the vertical data format (Eclat) is a method that
transforms a given data set of transactions in the horizontal data format of TID-
itemset into the vertical data format of item-TID set. It mines the transformed
data set by TID set intersections based on the Apriori property and additional
optimization techniques such as diffset.

Not all strong association rules are interesting. Therefore, the support–conﬁdence
framework should be augmented with a pattern evaluation measure, which promotes
the mining of interesting rules. A measure is null-invariant if its value is free from
the inﬂuence of null-transactions (i.e., the transactions that do not contain any of
the itemsets being examined). Among many pattern evaluation measures, we exam-
2
ined lift, χ , all conﬁdence, max conﬁdence, Kulczynski, and cosine, and showed

304 305 306 307 308 309 310 311 312 313 314