Page 283 -

P. 283

HAN 13-ch06-243-278-9780123814791
2011/6/1

246 Chapter 6 Mining Frequent Patterns, Associations, and Correlations 3:20 Page 246 #4

6.1.2 Frequent Itemsets, Closed Itemsets,
and Association Rules
Let I = {I 1 , I 2 ,..., I m } be an itemset. Let D, the task-relevant data, be a set of database
transactions where each transaction T is a nonempty itemset such that T ⊆ I. Each
transaction is associated with an identiﬁer, called a TID. Let A be a set of items. A trans-
action T is said to contain A if A ⊆ T. An association rule is an implication of the form
A ⇒ B, where A ⊂ I, B ⊂ I, A 6= ∅, B 6= ∅, and A ∩ B = φ. The rule A ⇒ B holds in the
transaction set D with support s, where s is the percentage of transactions in D that
contain A ∪ B (i.e., the union of sets A and B say, or, both A and B). This is taken to be
1
the probability, P(A ∪ B). The rule A ⇒ B has conﬁdence c in the transaction set D,
where c is the percentage of transactions in D containing A that also contain B. This is
taken to be the conditional probability, P(B|A). That is,

support(A⇒B) =P(A ∪ B) (6.2)
conﬁdence(A⇒B) =P(B|A). (6.3)

Rules that satisfy both a minimum support threshold (min sup) and a minimum con-
ﬁdence threshold (min conf ) are called strong. By convention, we write support and
conﬁdence values so as to occur between 0% and 100%, rather than 0 to 1.0.
2
A set of items is referred to as an itemset. An itemset that contains k items is a
k-itemset. The set {computer, antivirus software} is a 2-itemset. The occurrence fre-
quency of an itemset is the number of transactions that contain the itemset. This is
also known, simply, as the frequency, support count, or count of the itemset. Note
that the itemset support deﬁned in Eq. (6.2) is sometimes referred to as relative support,
whereas the occurrence frequency is called the absolute support. If the relative support
of an itemset I satisﬁes a prespeciﬁed minimum support threshold (i.e., the absolute
support of I satisﬁes the corresponding minimum support count threshold), then I is
3
a frequent itemset. The set of frequent k-itemsets is commonly denoted by L k . 4
From Eq. (6.3), we have

support(A ∪ B) support count(A ∪ B)
conﬁdence(A⇒B) = P(B|A) = = . (6.4)
support(A) support count(A)

1 Notice that the notation P(A ∪ B) indicates the probability that a transaction contains the union of sets
A and B (i.e., it contains every item in A and B). This should not be confused with P(A or B), which
indicates the probability that a transaction contains either A or B.
2 In the data mining research literature, “itemset” is more commonly used than “item set.”
3 In early work, itemsets satisfying minimum support were referred to as large. This term, however,
is somewhat confusing as it has connotations of the number of items in an itemset rather than the
frequency of occurrence of the set. Hence, we use the more recent term frequent.
4 Although the term frequent is preferred over large, for historic reasons frequent k-itemsets are still
denoted as L k .

278 279 280 281 282 283 284 285 286 287 288