Page 321 - Introduction to Statistical Pattern Recognition

P. 321

7 Nonparametric Classification and Error Estimation 303

HN Approach

Classifier: Using the kNN density estimate of Chapter 6, the likelihood
ratio classifier becomes

I
dz(xk:)N~.x) (kl-l)N2 lX2 112 0,
=-n In -In ><r, (7.5)
dI(Xil,)NN,X) (k2-1)NI IC, wz
where 11, =n”12r1(n/2+1)IC, l”2d:’ from (B.l), and df(Y,X) =
(Y-X)TC;l(Y-X). In order to classify a test sample X, the klth NN from oI
and the k2th NN from o2 are found, the distances from X to these neighbors
are measured, and these distances are inserted into (7.5) to test whether the
left-hand side is smaller or larger than t. In order to avoid unnecessary com-
plexity, k, = k2 is assumed in this chapter.

Error estimation: The classification error based on a given data set S
can be estimated by using the L and R methods. When Xi1) from o1 is tested
by the R method, Xi1) must be included as a member of the design set. There-
fore, when the kNN’s of Xi’) are found from the wI design set, Xi’’ itself is
included among these kNN’s. Figure 7-1 shows how the kNN’s are selected
and how the distances to the kth NN’s are measured for k = 2. Note in Fig. 7-1
that the locus of points equidistant from Xi!) becomes ellipsoidal because the
distance is normalized by E,. Also, since Cl # C2 in general, two different
ellipsoids are used for o, and 02. In the R method, Xi1) and Xi,(, are the
nearest and second nearest neighbors of Xi1) from o1 , while X,$, and X$& are
the nearest and second nearest neighbors of Xi1) from 02. Thus,

On the other hand, in the L method, Xi” is no longer considered a
member of the design set. Therefore, X$h and XgN are selected as the nearest
and second nearest neighbors of Xi’) from 0,. The selection of o2 neighbors
is the same as before. Thus,

316 317 318 319 320 321 322 323 324 325 326