Page 119 -
P. 119

106    4 Statistical Classification


                              Figure  4.25a  shows  the  classification  matrix  for  the  two-class  cork  stoppers
                           problem  using  the whole ten-feature set and equal prevalences.  The performance
                           did  not  increase  significantly  compared  with  the  two-feature  solution  presented
                           previously,  and  is  worse  than  the  solution  using  the  four-feature  vector  [ART
                           PRM  NG  RAAR]', as shown in Figure 4.25b.
                              There are, however, further compelling reasons for not using a large number of
                           features.  As  a  matter  of  fact,  when  using  estimates  of  means  and  covariance
                           derived  from  a  training  set,  we  are  designing  a  biased  classifier,  fitted  to  the
                           training  set  (review  section  2.6  again).  Therefore,  we  should  expect  that  our
                           training  set error estimates  are, on  average,  optimistic. On  the  other hand,  error
                           estimates  obtained  in  independent  test  sets  are  expected  to  be,  on  average,
                           pessimistic.
                              The  bibliography  at  the  end  of  the  present  chapter  includes  references
                            explaining the mathematical details of this issue. We present here some important
                            results as an aid for the designer to choose sensible values for the dimensionality
                            ratio, nld. Later, when we discuss the topic of classifier evaluation, we will come
                            back to this issue from another perspective.
                              Let us denote:

                              Pe       -  Probability of error of an optimum Bayesian classifier.
                              PeXn)    -  Training (design) set estimate of Pe for n patterns.
                              Pe,(n)   -  Test set estimate of Pe for n patterns.

                              Thus, the quantity Pedn) represents  an  estimate of Pe  influenced  only by  the
                            finite size of  the  design  set, i.e.,  the classifier error  is  measured  exactly  and  its
                            deviation  from Pe  is  due solely  to the  finiteness of  the  design  set; the  quantity
                            Pe,(n) represents an estimate of Pe influenced only by the finite size of the test set,
                            i.e.,  the error of  the Bayesian  classifier is estimated  by  counting  how  many of  n
                            patterns are misclassified. These quantities verify  that Ped(w)=Pe  and Pe,(w)=Pe,
                            i.e., they converge to the theoretical value Pe with increasing values of n.
                              In normal practice,  these error probabilities  are not known exactly. Instead, we
                            compute  estimates  of  these  probabilities, Ped and Pe, ,  as  percentages  of
                            misclassified  patterns,  in  exactly  the  same  way  as  we  have  done  in  the
                            classification  matrices  presented  so  far.  The  probability  of  obtaining  k
                            misclassified patterns out of  n for a classifier with a theoretical  error Pe, is given
                            by the binomial law:




                              The maximum likelihood estimation of Pe under this binomial law is precisely:






                              The standard deviation of this estimate is simply:
   114   115   116   117   118   119   120   121   122   123   124