Page 321 - Introduction to Statistical Pattern Recognition
P. 321

7  Nonparametric Classification  and Error Estimation         303


                    HN Approach

                         Classifier: Using  the  kNN  density  estimate  of  Chapter 6, the  likelihood
                    ratio classifier becomes






                                                               I
                                     dz(xk:)N~.x)   (kl-l)N2 lX2  112   0,
                               =-n  In           -In               ><r,          (7.5)
                                     dI(Xil,)NN,X)   (k2-1)NI IC,   wz
                    where   11,  =n”12r1(n/2+1)IC, l”2d:’   from   (B.l),   and   df(Y,X) =
                    (Y-X)TC;l(Y-X).   In order to classify  a  test  sample X,  the  klth NN  from oI
                    and the  k2th NN  from  o2 are  found, the  distances  from X to  these  neighbors
                    are  measured,  and  these  distances  are  inserted  into  (7.5) to  test  whether  the
                    left-hand  side is  smaller  or larger  than  t.  In  order to  avoid  unnecessary  com-
                    plexity, k, = k2  is assumed in this chapter.

                         Error estimation: The classification  error  based  on  a  given  data  set  S
                    can be estimated by  using  the L  and R  methods.  When  Xi1) from o1 is tested
                    by the R  method,  Xi1) must be included as a member of the design  set.  There-
                    fore,  when  the  kNN’s of  Xi’) are  found from the  wI design  set,  Xi’’ itself  is
                    included  among  these  kNN’s.  Figure  7-1  shows how  the  kNN’s  are  selected
                    and how the distances to the kth NN’s are measured  for k  = 2.  Note in Fig. 7-1
                    that  the  locus  of  points  equidistant  from Xi!)  becomes  ellipsoidal  because  the
                    distance  is  normalized  by  E,.  Also,  since  Cl # C2 in  general,  two  different
                    ellipsoids  are  used  for  o, and  02. In  the  R  method,  Xi1) and  Xi,(,  are  the
                    nearest and second nearest neighbors  of Xi1) from o1 , while  X,$,  and X$&  are
                    the nearest and second nearest neighbors of Xi1) from 02. Thus,









                         On  the  other  hand,  in  the  L  method,  Xi”  is  no  longer  considered  a
                    member of the design set.  Therefore, X$h  and XgN are selected as the nearest
                    and  second nearest  neighbors  of Xi’) from 0,. The selection  of  o2 neighbors
                    is the same as before.  Thus,
   316   317   318   319   320   321   322   323   324   325   326