Page 291 -
P. 291

HAN
                                                            2011/6/1
                               13-ch06-243-278-9780123814791
          254   Chapter 6 Mining Frequent Patterns, Associations, and Correlations  3:20 Page 254  #12



                   6.2.2 Generating Association Rules from Frequent Itemsets
                         Once the frequent itemsets from transactions in a database D have been found, it is
                         straightforward to generate strong association rules from them (where strong associa-
                         tion rules satisfy both minimum support and minimum confidence). This can be done
                         using Eq. (6.4) for confidence, which we show again here for completeness:
                                                               support count(A ∪ B)
                                      confidence(A ⇒ B) = P(B|A) =                .
                                                                 support count(A)

                           The conditional probability is expressed in terms of itemset support count, where
                         support count(A ∪ B) is the number of transactions containing the itemsets A ∪ B, and
                         support count(A) is the number of transactions containing the itemset A. Based on this
                         equation, association rules can be generated as follows:

                           For each frequent itemset l, generate all nonempty subsets of l.
                                                                                  support count(l)
                           For every nonempty subset s of l, output the rule “s ⇒ (l − s)” if  ≥
                                                                                  support count(s)
                           min conf, where min conf is the minimum confidence threshold.
                           Because the rules are generated from frequent itemsets, each one automatically satis-
                         fies the minimum support. Frequent itemsets can be stored ahead of time in hash tables
                         along with their counts so that they can be accessed quickly.

            Example 6.4 Generating association rules. Let’s try an example based on the transactional data for
                         AllElectronics shown before in Table 6.1. The data contain frequent itemset X = {I1, I2,
                         I5}. What are the association rules that can be generated from X? The nonempty subsets
                         of X are {I1, I2}, {I1, I5}, {I2, I5}, {I1}, {I2}, and {I5}. The resulting association rules are
                         as shown below, each listed with its confidence:
                             {I1,I2} ⇒ I5,  confidence = 2/4 = 50%
                             {I1,I5} ⇒ I2,  confidence = 2/2 = 100%
                             {I2,I5} ⇒ I1,  confidence = 2/2 = 100%
                             I1 ⇒ {I2,I5},  confidence = 2/6 = 33%
                             I2 ⇒ {I1,I5},  confidence = 2/7 = 29%
                             I5 ⇒ {I1,I2},  confidence = 2/2 = 100%
                           If the minimum confidence threshold is, say, 70%, then only the second, third, and
                         last rules are output, because these are the only ones generated that are strong. Note
                         that, unlike conventional classification rules, association rules can contain more than
                         one conjunct in the right side of the rule.

                   6.2.3 Improving the Efficiency of Apriori

                         “How can we further improve the efficiency of Apriori-based mining?” Many variations of
                         the Apriori algorithm have been proposed that focus on improving the efficiency of the
                         original algorithm. Several of these variations are summarized as follows:
   286   287   288   289   290   291   292   293   294   295   296