Page 127 -
P. 127
114 4 Statistical Classification
If there are few points around x, the region grows large and the volume V(n)
will also be large, therefore yielding a low density value. If there are many points
around x, in which case x is a high density point, the volume V(n) will be small.
The smoothing parameter in this case is k(n) which must grow with n. Fukunaga
and Hostetler (1973) derived an expression for k(n), which for normal distributions
is kon4"d*'. The constant ko, initially 1.0 for d=l, decreases with d, to 0.75 for d=4
and to 0.42 for d=32. For instance, the two-class cork stoppers classification with
100 patterns would need about 7 neighbours when using 4 features. As we have
done with the h(n) factor in Parzen window estimation, the k(n) factor also has to
be adjusted experimentally.
When used as apdf estimator the k-nearest neighbours method, or k-NN method
for short, has the folowing shortcomings: the divergence to infinite of the integral
of the density estimate and the computational burden of the method. Rather than
using this method for pdf estimation, we will use it simply as a classification
method, according to the following nearest neighbour rule: consider the k(n) points
that are nearest neighbours of x, using a certain distance metric; the classification
of x is then the class label that is found in majority among the k(n) neighbours. The
distance metric used results in different decision boundaries.
When applying the k-NN method, we are interested in knowing its attainable
performance in terms of its error rate Pe(k) for an arbitrarily large population, i.e.,
for n + m , compared with the optimal Bayes error Pe. Bounds on Pe(k) are
studied and presented in Duda and Hart (1973) and can be expressed, for two
classes and odd k(n), as:
In this formula Pe is the Bayes error. In particular, for the 1-NN case, the error
has the following upper bound:
These bounds are shown in Figure 4.30. As can be seen, Pe(k) approaches Pe
for increasing values of k, also demanding large numbers of points, and is never
worse than 2Pe. Even in the case of k=l, the nearest neighbour rule, the
performance of the technique is quite remarkable. Bear in mind, however, that
Pe(k) is the asymptotic value of the overall error rate when the number of available
points grows towards infinity. A training set estimate of Pe(k) can have a
significant bias relative to this asymptotic value. This issue is discussed in detail in
Fukunaga and Hummels (1987) where the effects of dimensionality and metric are
presented. An important conclusion of this study is that the bias tends to decay