Page 170 - Classification Parameter Estimation & State Estimation An Engg Approach Using MATLAB

P. 170

NONPARAMETRIC LEARNING 159

E 1 E min 2 E min 2E min ð5:33Þ
K 1

Apparently, replacing the true probability densities with estimations
based on the first nearest neighbour gives an error rate that is at most
twice the minimum. Thus, at least half of the classification information
in a dense training set is contained in the first nearest neighbour.
In the two-class problem (K ¼ 2) the following bound can be proven:

E 1
if is odd
E E min þ p ﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ
0:5ð 1Þ
ð5:34Þ
E ¼ E 1 if is even

The importance of (5.34) is that it shows that the performance of the
-NNR approximates the optimum as increases. This asymptotic
optimality holds true only if the training set is dense (N S !1). Never-
theless, even in the small sized training set given in Figure 5.5 it can be
seen that the 7-NNR is superior to the 1-NNR. The topology of the
compartments in Figure 5.5(b) is very specific for the given training set.
This is in contrast with the topology of the 7-NNR in Figure 5.5(a). The
7-NNR generalizes better than the 1-NNR.
Unfortunately, -NNR classifiers also have some serious disadvantages:

. A distance measure is needed in order to decide which sample in the
training set is nearest. Usually the Euclidean distance measure is
chosen, but this choice needs not to be optimal. A systematic
method to determine the optimal measure is hard to find.
. The optimality is reached only when !1. But since at the same
time it is required that /N S ! 0, the demand on the size of the
training set is very high. If the size is not large enough, -NNR
classification may be far from optimal.
. If the training set is large, the computational complexity of -NNR
classification becomes a serious burden.

Many attempts have been made to remedy the last drawback. One method
is to design fast algorithms with suitably chosen data structures (e.g.
hierarchical data structures combined with a pre-ordered training set).
Another method is to preprocess the training set so as to speed up the
search for nearest neighbours. There are two principles on which this
reduction can be based: editing and condensing. The first principle is to

165 166 167 168 169 170 171 172 173 174 175