Page 374 - Introduction to Statistical Pattern Recognition
P. 374

356                        Introduction to Statistical Pattern Recognition



                     perty  will  be  true regardless of  the  underlying distribution.  Table 7-6 shows
                     the  amounts of  shift for  various  values  of  r(X). However, it  must  be  noted
                     that these risk lines should not be drawn around the theoretical Bayes risk line
                     (the solid line of  Fig. 7-15).  The kNN  density estimates and subsequently the
                     estimate of  r(X) are heavily biased  as discussed in  the  previous sections.  In
                     order to compensate these biases, the threshold terms of (7.80) and (7.81) must
                     be adjusted and will differ from the theoretical values indicated in  (. ).  Further
                     shift due to lnr (X)l( 1-r  (X)) must start from the adjusted threshold.

                                                TABLE 7-6

                                      SHIFT OF THRESHOLD DUE TO r

                                   r    0.5    0.4    0.3     0.2    0.1

                                  kAt    0    0.405   0.847   1.386   2.197




                          These constant risk lines allow the analyst to identify samples in  a reject
                     region easily [17-181.  For a given reject threshold z,  the reject region on the
                     display  is  the  area  between  two  45  lines  specified  by  r(X) = 2, in  which
                     r(X) > z is satisfied and accordingly samples are rejected.


                          Grouped  error  estimate:  An  obvious  method  of  error  estimation  in
                     display  is  to  count  the  number  of  ol- and  w2-samples in  the  02- and  wI-
                                                                       A
                      regions, respectively.  Another possible method  is  to  read  r(Xj) for each Xi,
                      and to compute the sample mean as


                                                                                 (7.82)



                      because the Bayes error is expressed by  E*  = E(r(X)]. This estimate is called
                      the grouped estimate [19-201.  The randomness of  E comes from two sources:
                                              A
                      one from the estimation of r, r, and the other from Xi.  When the conventional
                      error-counting process is used, we  design a classifier by  estimating the density
   369   370   371   372   373   374   375   376   377   378   379