Page 310 -
P. 310
13-ch06-243-278-9780123814791
#31
2011/6/1
3:20 Page 273
HAN
6.5 Exercises 273
that only the latter four are null-invariant. We suggest using the Kulczynski measure,
together with the imbalance ratio, to present pattern relationships among itemsets.
6.5 Exercises
6.1 Suppose you have the set C of all frequent closed itemsets on a data set D, as well
as the support count for each frequent closed itemset. Describe an algorithm to
determine whether a given itemset X is frequent or not, and the support of X if it
is frequent.
6.2 An itemset X is called a generator on a data set D if there does not exist a proper
sub-itemset Y ⊂ X such that support(X) = support(Y). A generator X is a frequent
generator if support(X) passes the minimum support threshold. Let G be the set of
all frequent generators on a data set D.
(a) Can you determine whether an itemset A is frequent and the support of A, if it
is frequent, using only G and the support counts of all frequent generators? If
yes, present your algorithm. Otherwise, what other information is needed? Can
you give an algorithm assuming the information needed is available?
(b) What is the relationship between closed itemsets and generators?
6.3 The Apriori algorithm makes use of prior knowledge of subset support properties.
(a) Prove that all nonempty subsets of a frequent itemset must also be frequent.
0
(b) Prove that the support of any nonempty subset s of itemset s must be at least
as great as the support of s.
(c) Given frequent itemset l and subset s of l, prove that the confidence of the rule
0
0
0
“s ⇒ (l − s )” cannot be more than the confidence of “s ⇒ (l − s),” where s is
a subset of s.
(d) A partitioning variation of Apriori subdivides the transactions of a database D
into n nonoverlapping partitions. Prove that any itemset that is frequent in D
must be frequent in at least one partition of D.
6.4 Let c be a candidate itemset in C k generated by the Apriori algorithm. How many
length-(k − 1) subsets do we need to check in the prune step? Per your previ-
ous answer, can you give an improved version of procedure has infrequent subset
in Figure 6.4?
6.5 Section 6.2.2 describes a method for generating association rules from frequent
itemsets. Propose a more efficient method. Explain why it is more efficient than
the one proposed there. (Hint: Consider incorporating the properties of Exercises
6.3(b), (c) into your design.)
6.6 A database has five transactions. Let min sup = 60% and min conf = 80%.