Page 376 - Introduction to Statistical Pattern Recognition

P. 376

358 Introduction to Statistical Pattern Recognition

Edited Nearest Neighbor
Edited kNN: As seen in Fig. 7-3 for the voting kNN approach, the ai-
samples in the a,-region (i # J) could become close neighbors of some of the
a,-samples, and this produces extra errors. This is the reason why the kNN
error is larger than the Bayes error. These extra errors could be reduced by
removing ai-samples in the oj-region (samples 1, 3, 5 and 6 of Fig. 7-3) from
the design sample set. In practice, since the exact location of the Bayes boun-
dary is never known, the above operation is replaced by removing the
misclassified samples (1, 2, 3, 4, 5 and 6 of Fig. 7-3) by the kNN classification
[21]. The resulting design set is called the edited set. Test samples are
classified by the kNN’s from the edited set. This algorithm is called the edited
kNN procedure. It should be noted, however, that some of wi-samples in the
oj-region are correctly classified and not removed, and some of oj-samples in
the ai-region are misclassified and removed. Therefore, the resulting error is
not the same as the Bayes error.

Asymptotic analysis: In order to analyze the edited kNN procedure, let
us study the simplest case of the asymptotic NN. In the original sample set,
the populations of aI and o, given X are represented by a posteriori probabili-
ties q I (X) and q2(X). Applying the voting NN classification to the oi-samples,
q?(X) (qitX)qj(XNN)) are correctly classified, and qi(X)qj(X) (qitX)qitXNN))
0’ f i) are misclassified. Therefore, keeping only correctly classified samples,
a posteriori probabilities in the edited set, q;(X), become

(7.84)

Samples from the original set are classified according to the class of the NN
from the edited set. The resulting error is

371 372 373 374 375 376 377 378 379 380 381