Page 54 -
P. 54

3:12
                                                                           Page 17
                                                                                   #17
                                                             2011/6/1
                          HAN 08-ch01-001-038-9780123814791
                                                            1.4 What Kinds of Patterns Can Be Mined?  17


                                 Concept description, including characterization and discrimination, is described in
                               Chapter 4.

                         1.4.2 Mining Frequent Patterns, Associations, and Correlations

                               Frequent patterns, as the name suggests, are patterns that occur frequently in data.
                               There are many kinds of frequent patterns, including frequent itemsets, frequent sub-
                               sequences (also known as sequential patterns), and frequent substructures. A frequent
                               itemset typically refers to a set of items that often appear together in a transactional
                               data set—for example, milk and bread, which are frequently bought together in gro-
                               cery stores by many customers. A frequently occurring subsequence, such as the pattern
                               that customers, tend to purchase first a laptop, followed by a digital camera, and then
                               a memory card, is a (frequent) sequential pattern. A substructure can refer to different
                               structural forms (e.g., graphs, trees, or lattices) that may be combined with itemsets
                               or subsequences. If a substructure occurs frequently, it is called a (frequent) structured
                               pattern. Mining frequent patterns leads to the discovery of interesting associations and
                               correlations within data.

                  Example 1.7 Association analysis. Suppose that, as a marketing manager at AllElectronics, you want
                               to know which items are frequently purchased together (i.e., within the same transac-
                               tion). An example of such a rule, mined from the AllElectronics transactional database, is

                                   buys(X,“computer”) ⇒ buys(X,“software”) [support = 1%,confidence = 50%],

                               where X is a variable representing a customer. A confidence, or certainty, of 50%
                               means that if a customer buys a computer, there is a 50% chance that she will buy
                               software as well. A 1% support means that 1% of all the transactions under analysis
                               show that computer and software are purchased together. This association rule involves
                               a single attribute or predicate (i.e., buys) that repeats. Association rules that contain a
                               single predicate are referred to as single-dimensional association rules. Dropping the
                               predicate notation, the rule can be written simply as “computer ⇒ software [1%, 50%].”
                                 Suppose, instead, that we are given the AllElectronics relational database related to
                               purchases. A data mining system may find association rules like
                                        age(X, “20..29”) ∧ income(X, “40K..49K”) ⇒ buys(X, “laptop”)

                                           [support = 2%, confidence = 60%].
                               The rule indicates that of the AllElectronics customers under study, 2% are 20 to 29 years
                               old with an income of $40,000 to $49,000 and have purchased a laptop (computer)
                               at AllElectronics. There is a 60% probability that a customer in this age and income
                               group will purchase a laptop. Note that this is an association involving more than one
                               attribute or predicate (i.e., age, income, and buys). Adopting the terminology used in
                               multidimensional databases, where each attribute is referred to as a dimension, the
                               above rule can be referred to as a multidimensional association rule.
   49   50   51   52   53   54   55   56   57   58   59