Page 134 -
P. 134

4.4 Feature Selection   12 1


                            The area under the ROC curve is computed by the SPSS with a 95% confidence
                          interval. For the FHR-Apgar data these areas are 0.709 2 0.11 and 0.7812 0.10 for
                          ABLTV and ABSTV, respectively.
                            Despite some shortcomings, the ROC curve area method is a popular method of
                          assessing  classifier  performance.  This  and  an  alternative  method  based  on
                          information theory are described in Metz et al. (1973).


                          4.4  Feature Selection


                          As  already seen in sections 2.7 and 4.2.3, great care must be exercised in reducing
                          the  number  of  features  used  by  a  classifier,  in  order  to  maintain  a  high
                           dimensionality ratio and therefore reproducible performance, with error estimates
                           sufficiently near the theoretical value. For this purpose, several feature assessment
                           techniques were already explained in chapter 2 with  the aim of discarding features
                           that are clearly non-useful at an initial stage of the PR project.
                             The feature assessment task, while assuring that an information-carrying feature
                           set is indeed used in a PR project, does not guarantee that a given classifier needs
                           the  whole  set.  Consider,  for  instance,  that  we  are  presented  with  a  set  of  two-
                           dimensional patterns described by feature vectors consisting of 4 features, x,, x2, x,
                           and x4, with x3 and x4 being the eigenvectors of the covariance matrix of xl and x;?.
                           Assuming that the true dimension of the patterns is not known, statistical tests find
                           that all features contribute to pattern discrimination. However, this discrimination
                           could  be  performed  equally  well  using  the  alternative  sets  (x,, x2} or  (~3, x4}.
                           Briefly,  discarding  features  with  no  aptitude  for  pattern  discrimination  is  no
                           guarantee against redundant features, and it is, therefore, good practice to attempt
                           some sort of feature selection.
                             There  is  abundant  literature  on  the  topic  of  feature  selection.  Important
                           references are included in  the bibliography. The most popular methods of feature
                           selection use  a  search  procedure  of  a feature  subset  obeying  a  stipulated  merit
                           criterion. Let  F, be  the  original  set  of  t  features  and  F be  any  subset  whose
                           cardinality  14  is  the  desired  dimensionality  d, ]1;1 = d.  Furthermore,  let  J(F)
                           represent the merit criterion used in the selection. The problem of feature selection
                           is to find a subset F* such that:

                              J(F*) = max J(F) .
                                    FCF,.~F(=~

                              A possible  choice  for  J(F) is  I-Pe,  with  the  disadvantage  that  the  feature
                            selection process  depends  on  the  chosen  type  of  classifier.  More  often,  a  class
                            separability criterion such as the Bhattacharyya distance or the Anova F statistic is
                            used.
                              As for the search method, there is a broad scope of possibilities. In the following
                            we  mention  several relevant  methods, many  of  which  can  be  found  in  available
                            software products.
   129   130   131   132   133   134   135   136   137   138   139