Page 310 -
P. 310

13-ch06-243-278-9780123814791
                                                                                    #31
                                                            2011/6/1
                                                                     3:20 Page 273
                         HAN
                                                                                    6.5 Exercises  273


                                 that only the latter four are null-invariant. We suggest using the Kulczynski measure,
                                 together with the imbalance ratio, to present pattern relationships among itemsets.


                       6.5     Exercises


                               6.1 Suppose you have the set C of all frequent closed itemsets on a data set D, as well
                                   as the support count for each frequent closed itemset. Describe an algorithm to
                                   determine whether a given itemset X is frequent or not, and the support of X if it
                                   is frequent.
                               6.2 An itemset X is called a generator on a data set D if there does not exist a proper
                                   sub-itemset Y ⊂ X such that support(X) = support(Y). A generator X is a frequent
                                   generator if support(X) passes the minimum support threshold. Let G be the set of
                                   all frequent generators on a data set D.
                                   (a) Can you determine whether an itemset A is frequent and the support of A, if it
                                      is frequent, using only G and the support counts of all frequent generators? If
                                      yes, present your algorithm. Otherwise, what other information is needed? Can
                                      you give an algorithm assuming the information needed is available?
                                  (b) What is the relationship between closed itemsets and generators?
                               6.3 The Apriori algorithm makes use of prior knowledge of subset support properties.
                                   (a) Prove that all nonempty subsets of a frequent itemset must also be frequent.
                                                                             0
                                  (b) Prove that the support of any nonempty subset s of itemset s must be at least
                                      as great as the support of s.
                                   (c) Given frequent itemset l and subset s of l, prove that the confidence of the rule
                                               0
                                       0
                                                                                                  0
                                      “s ⇒ (l − s )” cannot be more than the confidence of “s ⇒ (l − s),” where s is
                                      a subset of s.
                                  (d) A partitioning variation of Apriori subdivides the transactions of a database D
                                      into n nonoverlapping partitions. Prove that any itemset that is frequent in D
                                      must be frequent in at least one partition of D.
                               6.4 Let c be a candidate itemset in C k generated by the Apriori algorithm. How many
                                   length-(k − 1) subsets do we need to check in the prune step? Per your previ-
                                   ous answer, can you give an improved version of procedure has infrequent subset
                                   in Figure 6.4?
                               6.5 Section 6.2.2 describes a method for generating association rules from frequent
                                   itemsets. Propose a more efficient method. Explain why it is more efficient than
                                   the one proposed there. (Hint: Consider incorporating the properties of Exercises
                                   6.3(b), (c) into your design.)
                               6.6 A database has five transactions. Let min sup = 60% and min conf = 80%.
   305   306   307   308   309   310   311   312   313   314   315