Page 170 - Classification Parameter Estimation & State Estimation An Engg Approach Using MATLAB
P. 170

NONPARAMETRIC LEARNING                                       159

                                            K

                            E 1   E min 2      E min    2E min         ð5:33Þ
                                          K   1

            Apparently, replacing the true probability densities with estimations
            based on the first nearest neighbour gives an error rate that is at most
            twice the minimum. Thus, at least half of the classification information
            in a dense training set is contained in the first nearest neighbour.
              In the two-class problem (K ¼ 2) the following bound can be proven:

                                         E 1
                                                     if   is odd
                         E     E min þ p ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
                                      0:5ð    1Þ
                                                                       ð5:34Þ
                         E   ¼ E   1                 if   is even

            The importance of (5.34) is that it shows that the performance of the
             -NNR approximates the optimum as   increases. This asymptotic
            optimality holds true only if the training set is dense (N S !1). Never-
            theless, even in the small sized training set given in Figure 5.5 it can be
            seen that the 7-NNR is superior to the 1-NNR. The topology of the
            compartments in Figure 5.5(b) is very specific for the given training set.
            This is in contrast with the topology of the 7-NNR in Figure 5.5(a). The
            7-NNR generalizes better than the 1-NNR.
              Unfortunately,  -NNR classifiers also have some serious disadvantages:

              . A distance measure is needed in order to decide which sample in the
                training set is nearest. Usually the Euclidean distance measure is
                chosen, but this choice needs not to be optimal. A systematic
                method to determine the optimal measure is hard to find.
              . The optimality is reached only when   !1. But since at the same
                time it is required that  /N S ! 0, the demand on the size of the
                training set is very high. If the size is not large enough,  -NNR
                classification may be far from optimal.
              . If the training set is large, the computational complexity of  -NNR
                classification becomes a serious burden.

            Many attempts have been made to remedy the last drawback. One method
            is to design fast algorithms with suitably chosen data structures (e.g.
            hierarchical data structures combined with a pre-ordered training set).
              Another method is to preprocess the training set so as to speed up the
            search for nearest neighbours. There are two principles on which this
            reduction can be based: editing and condensing. The first principle is to
   165   166   167   168   169   170   171   172   173   174   175