Page 128 -
P. 128
4.3 Model-Free Techniques 115
rather slowly with n, approximately as ah2, particularly for high dimensions.
Therefore, the number of patterns required to lower the bias significantly may be
quite prohibitive, especially for high dimensions.
Figure 4.30. Bounds on the asymptotic k-NN error rate, Pe(k), as a function of the
Bayes error Pe.
Another difficulty with the k-NN method is which patterns to keep in the
training set as "prototypes" for assessing the neighbourhoods. The edition method
has the goal of reducing the number of training set patterns, therefore the number
of distances to be computed, and at the same time improving the discrimination, by
discarding unreliable patterns from the training set. The method consists of the
following steps:
1. First, each pattern in the training set is classified using k neighbours from the
other training set patterns.
2. If the obtained classification is different from the original one, the pattern is
discarded from the training set. In this way a new, smaller training set, is
obtained.
3. The test patterns are classified using the 1-NN rule and the new training set
derived in step 2.
With the edition method, the patterns that are near the boundaries and have
unreliable assignments based on the k-NN rule are eliminated. The error
probability of this method was shown, by Devijver and Kittler (1980), to decrease
asymptotically after each successive edition in a suite of editions.
The KNN program allows one to perform k-NN experiments for two-class
situations. It was used to perform 50 runs of k-NN classification for the cork
stoppers data, with features N and PRTIO, using the leave-one-out method (see
section 4.5). An average error rate of 9% was obtained, near the value reported
previously for the normal classifier and with a similar 95% confidence interval.