Page 141 -
P. 141

HAN
                               10-ch03-083-124-9780123814791

          104   Chapter 3 Data Preprocessing                2011/6/1  3:16 Page 104  #22



                                                 4
                           Attribute subset selection reduces the data set size by removing irrelevant or
                         redundant attributes (or dimensions). The goal of attribute subset selection is to find
                         a minimum set of attributes such that the resulting probability distribution of the data
                         classes is as close as possible to the original distribution obtained using all attributes.
                         Mining on a reduced set of attributes has an additional benefit: It reduces the number
                         of attributes appearing in the discovered patterns, helping to make the patterns easier to
                         understand.
                           “How can we find a ‘good’ subset of the original attributes?” For n attributes, there are
                          n
                         2 possible subsets. An exhaustive search for the optimal subset of attributes can be pro-
                         hibitively expensive, especially as n and the number of data classes increase. Therefore,
                         heuristic methods that explore a reduced search space are commonly used for attribute
                         subset selection. These methods are typically greedy in that, while searching through
                         attribute space, they always make what looks to be the best choice at the time. Their
                         strategy is to make a locally optimal choice in the hope that this will lead to a globally
                         optimal solution. Such greedy methods are effective in practice and may come close to
                         estimating an optimal solution.
                           The “best” (and “worst”) attributes are typically determined using tests of statistical
                         significance, which assume that the attributes are independent of one another. Many
                         other attribute evaluation measures can be used such as the information gain measure
                         used in building decision trees for classification. 5
                           Basic heuristic methods of attribute subset selection include the techniques that
                         follow, some of which are illustrated in Figure 3.6.


                           Forward selection  Backward elimination   Decision tree induction
                          Initial attribute set:  Initial attribute set:  Initial attribute set:
                          {A , A , A , A , A , A } {A , A , A , A , A , A } {A , A , A , A , A , A }
                                                            1
                                  4
                                     5
                                                                     5
                                3
                                                              2
                                           1
                                              2
                                                3
                                                   4
                                                                        6
                                       6
                                                       6
                           1
                                                                   4
                              2
                                                                3
                                                     5
                          Initial reduced set:  => {A , A , A , A , A }
                                                     5
                                                       6
                                                3
                                              1
                                                  4
                          {}              => {A , A , A , A }             A 4 ?
                                                  5
                                              1
                                                     6
                                                4
                          => {A }         => Reduced attribute set:
                              1
                          => {A , A }          {A , A , A }         Y             N
                                              1
                              1   4
                                                  6
                                                4
                          => Reduced attribute set:
                               {A , A , A }                      A 1 ?              A 6 ?
                                  6
                                4
                              1
                                                              Y       N         Y       N
                                                           Class 1   Class 2  Class 1   Class 2
                                                          => Reduced attribute set:
                                                               {A , A , A }
                                                                   6
                                                              1
                                                                4
               Figure 3.6 Greedy (heuristic) methods for attribute subset selection.
                         4 In machine learning, attribute subset selection is known as feature subset selection.
                         5 The information gain measure is described in detail in Chapter 8.
   136   137   138   139   140   141   142   143   144   145   146