Page 357 - Introduction to Statistical Pattern Recognition
P. 357

7  Nonparametric Classification and Error Estimation          339



                         Estimated  kernel covariance:  So far,  we  have  assumed  that  the class
                    covariance matrices are known and given.  However, is practice, these covari-
                    ance  matrices  are  unknown  and  must  be  estimated  from  a  finite  number  of
                    samples.  This may  lead to an optimistically biased  error estimate.  For exam-
                    ple, in Data RADAR, upper and lower bounds of  the Bayes error are estimated
                    by  the L and R  methods.  If  720 samples per class  are used  in  this  operation
                    with  the  class covariance matrices estimated from 8800 samples per class, the
                    resulting upper and  lower bounds are  17.8% and  16.2% respectively.  On the
                    other  hand,  if  the  same  720  samples  are  used  to  estimate  the  covariance
                    matrices, bounds of  8.4% and  7.1%  result.  These bounds are further lowered
                    to 5.2% and 3.8%, when 360 samples per class are used in both the error esti-
                    mation  procedure  and  the  covariance  estimation.  These  results  demonstrate
                    that  the  upper  bound  of  the  Bayes  error  in  the  L  method  may  be  severely
                    biased.  Thus, the estimate may  no  longer give the upper bound  of the Bayes
                    error, if  the  class covariances are estimated from the  same data  used  to  form
                    the  error estimates.  If  possible, then,  to  avoid this  bias,  one  should estimate
                    the class covariances using a large number of independent samples.  Once the
                    covariances are estimated accurately, we may use a relatively small sample size
                    for the nonparametric procedures to produce reliable results.  However, if addi-
                    tional  samples for  estimation of  the  covariance matrices are  not  available,  in
                    order to obtain reliable upper bounds on the Bayes error, one must  use leave-
                    one-out type estimates of the kernel covariances when forming the L error esti-
                    mate.  This  implies the  use  of  a  different covariance  matrix  for each  sample
                    being tested.
                         In  order to  show how  the  kernel  covariance can be  estimated by  the L
                    method, let  us  study the  kernel  function of  (6.3).  In  Parzen  error estimation,
                    this kernel function is inserted into (7.2) and (7.3)  to test a sample Xi') from
                    o, in  the  L  method.  Using A; = Cj, which  is a good choice in many  applica-
                    tions, we need to compute  I Cj I and d?(Xi!',Xy)) = (Xil)-X(j))T J   E;' (Xi!)-X'j)).
                                                                                  J
                    When  the covariance matrix  Cj is not known and needs to be  estimated from
                                                             ,.
                    X"' ,  , . . , ,Xb,), Cj  is  replaced  by  its  estimate,  Xi,  and  subsequently  &  by
                                       .   ,.-I
                    ;y(Xl'),Xy)) = (X~."-Xy))'  Xj (Xi.')-X?)).  The L type  estimate  of  the  kernel
                    covariance means  that,  when  Xi')  is  tested,  Xi') must  be  excluded  from  the
                                                      ,.
                    sample set used  to estimate 1'. Letting Elk be  the resulting estimate, C, and
                                                              ..
                     -7                            -2             -2
                    d; now must be replaced by ilk dlk, while & and d2 are kept unchanged.
                                               and
                                                                      -2
                    When  the  sample  covariance  of  (5.9) is  used,  I  ZIP I  and  dlk  can  be  easily
   352   353   354   355   356   357   358   359   360   361   362