Page 285 -

P. 285

2011/6/1
HAN 13-ch06-243-278-9780123814791

248 Chapter 6 Mining Frequent Patterns, Associations, and Correlations 3:20 Page 248 #6

Thus, we say that C contains complete information regarding its corresponding frequent
itemsets. On the other hand, M registers only the support of the maximal itemsets. It
usually does not contain the complete support information regarding its corresponding
frequent itemsets. We illustrate these concepts with Example 6.2.

Example 6.2 Closed and maximal frequent itemsets. Suppose that a transaction database has only
two transactions: {ha 1 , a 2 ,..., a 100 i; ha 1 , a 2 ,..., a 50 i}. Let the minimum support count
threshold be min sup = 1. We ﬁnd two closed frequent itemsets and their support
counts, that is, C = {{a 1 , a 2 ,..., a 100 } : 1; {a 1 , a 2 ,..., a 50 } : 2}. There is only one max-
imal frequent itemset: M = {{a 1 , a 2 ,..., a 100 } : 1}. Notice that we cannot include
{a 1 , a 2 ,..., a 50 } as a maximal frequent itemset because it has a frequent superset,
{a 1 , a 2 ,..., a 100 }. Compare this to the preceding where we determined that there are
2 100 − 1 frequent itemsets, which are too many to be enumerated!
The set of closed frequent itemsets contains complete information regarding the fre-
quent itemsets. For example, from C, we can derive, say, (1) {a 2 , a 45 : 2} since {a 2 , a 45 } is
a sub-itemset of the itemset {a 1 , a 2 ,..., a 50 : 2}; and (2) {a 8 , a 55 : 1} since {a 8 , a 55 } is not
a sub-itemset of the previous itemset but of the itemset {a 1 , a 2 ,..., a 100 : 1}. However,
from the maximal frequent itemset, we can only assert that both itemsets ({a 2 , a 45 } and
{a 8 , a 55 }) are frequent, but we cannot assert their actual support counts.

6.2 Frequent Itemset Mining Methods

In this section, you will learn methods for mining the simplest form of frequent pat-
terns such as those discussed for market basket analysis in Section 6.1.1. We begin by
presenting Apriori, the basic algorithm for ﬁnding frequent itemsets (Section 6.2.1). In
Section 6.2.2, we look at how to generate strong association rules from frequent item-
sets. Section 6.2.3 describes several variations to the Apriori algorithm for improved
efﬁciency and scalability. Section 6.2.4 presents pattern-growth methods for mining
frequent itemsets that conﬁne the subsequent search space to only the data sets contain-
ing the current frequent itemsets. Section 6.2.5 presents methods for mining frequent
itemsets that take advantage of the vertical data format.

6.2.1 Apriori Algorithm: Finding Frequent Itemsets
by Conﬁned Candidate Generation
Apriori is a seminal algorithm proposed by R. Agrawal and R. Srikant in 1994 for min-
ing frequent itemsets for Boolean association rules [AS94b]. The name of the algorithm
is based on the fact that the algorithm uses prior knowledge of frequent itemset prop-
erties, as we shall see later. Apriori employs an iterative approach known as a level-wise
search, where k-itemsets are used to explore (k + 1)-itemsets. First, the set of frequent
1-itemsets is found by scanning the database to accumulate the count for each item, and

280 281 282 283 284 285 286 287 288 289 290