Page 187 - Classification Parameter Estimation & State Estimation An Engg Approach Using MATLAB
P. 187
176 SUPERVISED LEARNING
at hand. In practice, we often train a number of neural networks of
varying complexity and compare their performance on an independent
validation set. The danger of finding a too nonlinear decision boundary
is illustrated in Example 5.7.
Example 5.7 Classification of mechanical parts, neural networks
Figure 5.13 shows the decision boundaries found by training two
neural networks. The first network, whose decision boundaries is
shown in Figure 5.13(a), contains one hidden layer of five units. This
gives a reasonably smooth decision boundary. For the decision func-
tion shown in Figure 5.13(c), a network was used with two hidden
layers of 100 units each. This network has clearly found a highly
nonlinear solution, which does not generalize as well as the first
network. For example, the ‘ ’ region (nuts) contains one outlying ‘x’
sample (scrap). The decision boundary bends heavily to include the
single ‘x’ sample within the scrap region. Although such a crumpled
curve decreases the squared error, it is undesirable because the outlying
‘x’ is not likely to occur again at that same location in other realizations.
Note also the spurious region in the right bottom of the plot, in
which samples are classified as ‘bolt’ (denoted by þ in the scatterplot).
Here too, the network generalizes poorly as it has not seen any
examples in this region.
Figures 5.13(b) and (d) show the learn curves that were derived
during training the network. One epoch is a training period in which
the algorithm has cycled through all training samples. The figures
show ‘error’, which is the fraction of the training samples that are
erroneously classified, and ‘mse’, which is 2J /(KN S ). The larger
SE
the network is, the more epochs are needed before the minimum will
be reached. However, sometimes it is better to stop training before
actually reaching the minimum because the generalization ability can
degenerate in the vicinity of the minimum.
The figures were generated by the code shown in Listing 5.8.
Listing 5.8
PRTools code for training and plotting two neural network classifiers.
load nutsbolts; % Load the dataset
[w,R] ¼ bpxnc(z,5,500); % Train a small
network
figure; scatterd(z); plotc(w); % Plot the
classifier