Page 313 -
P. 313

HAN
                               13-ch06-243-278-9780123814791
                                                            2011/6/1
          276   Chapter 6 Mining Frequent Patterns, Associations, and Correlations  3:20 Page 276  #34



                        6.13 Give a short example to show that items in a strong association rule actually may
                             be negatively correlated.
                        6.14 The following contingency table summarizes supermarket transaction data, where
                             hot dogs refers to the transactions containing hot dogs, hot dogs refers to the
                             transactions that do not contain hot dogs, hamburgers refers to the transactions
                             containing hamburgers, and hamburgers refers to the transactions that do not
                             contain hamburgers.

                                                        hot dogs  hot dogs  6 row
                                            hamburgers  2000      500     2500
                                            hamburgers  1000     1500     2500
                                                        3000     2000     5000
                                            6 col
                             (a) Suppose that the association rule “hot dogs ⇒ hamburgers” is mined. Given a
                                minimum support threshold of 25% and a minimum confidence threshold of
                                50%, is this association rule strong?
                            (b) Based on the given data, is the purchase of hot dogs independent of the purchase
                                of hamburgers? If not, what kind of correlation relationship exists between the
                                two?
                             (c) Compare the use of the all confidence, max confidence, Kulczynski, and cosine
                                measures with lift and correlation on the given data.
                        6.15 (Implementation project) The DBLP data set (www.informatik.uni-trier
                             .de/∼ley/db/) consists of over one million entries of research papers pub-
                             lished in computer science conferences and journals. Among these entries, there
                             are a good number of authors that have coauthor relationships.
                             (a) Propose a method to efficiently mine a set of coauthor relationships that are
                                closely correlated (e.g., often coauthoring papers together).
                            (b) Based on the mining results and the pattern evaluation measures discussed in
                                this chapter, discuss which measure may convincingly uncover close collabora-
                                tion patterns better than others.
                             (c) Based on the study in (a), develop a method that can roughly predict advi-
                                sor and advisee relationships and the approximate period for such advisory
                                supervision.


                 6.6     Bibliographic Notes


                         Association rule mining was first proposed by Agrawal, Imielinski, and Swami [AIS93].
                         The Apriori algorithm discussed in Section 6.2.1 for frequent itemset mining was pre-
                         sented in Agrawal and Srikant [AS94b]. A variation of the algorithm using a similar
                         pruning heuristic was developed independently by Mannila, Tiovonen, and Verkamo
   308   309   310   311   312   313   314   315   316   317   318