Page 242 - Introduction to Statistical Pattern Recognition
P. 242

224                        Introduction to Statistical Pattern Recognition


                       computation  time,  the  addition  of  a  scalar  multiplication  is  negligibly  small.
                       Thus, we can perform both R and L methods  simultaneously  within the compu-
                       tation time needed to conduct the R  method  alone.  In other words, (5.121) and
                       (5.122) give a simple perturbation equation of the L method from the R method
                       such that we do not need to design the classifier N  times.
                            The  perturbation  factor  of  N,l(Ni-l)  is  always  larger  than  1.  This
                       increases  (Xi”-k I f(Xi’)--k I )  for  an  ol -sample,  Xi’),  and  (Xi2)-k2)T
                                for
                       (Xi2)-k2) an 02-sample, Xi2). For wI, Xi’) is misclassified  if > is satisfied
                       in (5.121).  Therefore, increasing the (Xi1)--h1 )T(Xi’)-k I) term by multiplying
                       [N1/(Nl-1)I2 means  that Xi”  has  more  chance  to  be  misclassified  in  the  L
                       method  than in the R  method.  The same is true for Xi2) in  (5.122).  Thus, the
                       L  method  gives  a  larger  error  than  the  R  method.  This  is  true  even  if  the
                       classifier  of  (5.117)  is  no  longer  the  Bayes.  That  is,  when  the  distance
                       classifier of  (5.1 17) is used,  the L  error is larger than  the R  error regardless  of
                       the test distributions.
                                    ,.
                            The above discussion may be illustrated in a one-dimensional  example of
                                           L.
                       Fig. 5-3, where m I  and m2 are computed by the sample means of  all  available










                                          I          I
                                          I  14 dlR  =!=  d2R
                                          I          I
                                          I -   d,L   -1-  I   d2L  *
                                          I
                                          I          I
                               Fig. 5-3  An example of the leave-one-out error estimation.

                       samples,  and  each  sample  is  classified  according  to  the  nearest  mean.  For
                       example, xi’) is  correctly  classified  by  the  R  method,  because  dlR<dZ and
                       thus xi’) is classified  to ol. On  the  other hand,  in  the L  method,  xi!) must  be
                                                                            L.
                       excluded from estimating  the ol -mean.  The new sample mean, m Ik, is shifted
                       to the left side, thus  increasing the distance  between  xi”  and A Ik,  dlL, On the
                       other hand, dZL is the same as d,.   Since dlL > d2L, xi’) is misclassified to o2
                       in the L method.
   237   238   239   240   241   242   243   244   245   246   247