Page 187 - Classification Parameter Estimation & State Estimation An Engg Approach Using MATLAB
P. 187

176                                        SUPERVISED LEARNING

            at hand. In practice, we often train a number of neural networks of
            varying complexity and compare their performance on an independent
            validation set. The danger of finding a too nonlinear decision boundary
            is illustrated in Example 5.7.

              Example 5.7 Classification of mechanical parts, neural networks
              Figure 5.13 shows the decision boundaries found by training two
              neural networks. The first network, whose decision boundaries is
              shown in Figure 5.13(a), contains one hidden layer of five units. This
              gives a reasonably smooth decision boundary. For the decision func-
              tion shown in Figure 5.13(c), a network was used with two hidden
              layers of 100 units each. This network has clearly found a highly
              nonlinear solution, which does not generalize as well as the first
              network. For example, the ‘ ’ region (nuts) contains one outlying ‘x’
              sample (scrap). The decision boundary bends heavily to include the
              single ‘x’ sample within the scrap region. Although such a crumpled
              curve decreases the squared error, it is undesirable because the outlying
              ‘x’ is not likely to occur again at that same location in other realizations.
                Note also the spurious region in the right bottom of the plot, in
              which samples are classified as ‘bolt’ (denoted by þ in the scatterplot).
              Here too, the network generalizes poorly as it has not seen any
              examples in this region.
                Figures 5.13(b) and (d) show the learn curves that were derived
              during training the network. One epoch is a training period in which
              the algorithm has cycled through all training samples. The figures
              show ‘error’, which is the fraction of the training samples that are
              erroneously classified, and ‘mse’, which is 2J /(KN S ). The larger
                                                         SE
              the network is, the more epochs are needed before the minimum will
              be reached. However, sometimes it is better to stop training before
              actually reaching the minimum because the generalization ability can
              degenerate in the vicinity of the minimum.
                The figures were generated by the code shown in Listing 5.8.


            Listing 5.8
            PRTools code for training and plotting two neural network classifiers.


            load nutsbolts;                                 % Load the dataset
            [w,R] ¼ bpxnc(z,5,500);                         % Train a small
                                                             network
            figure; scatterd(z); plotc(w);                  % Plot the
                                                             classifier
   182   183   184   185   186   187   188   189   190   191   192