Page 217 -
P. 217
5.7 Approximation Methods in NN Training 205
class C. Although the error figures for the training, verification and test sets were
quite similar, we must take into account that formula (5-56) for complex MLPs
indicates a number of patterns needed for sufficient generalization of
approximately wlPe, which in this case corresponds to 3392 patterns, a higher
number of patterns than we have available for training. The previous error figures
may be, therefore, somewhat optimistic.
When using the conjugate gradient method in high dimensional feature spaces,
it is often found that the algorithm gets stuck in local minima in the early steps of
the process. It may be helpful in such cases to use back-propagation in the early
steps (e.g. 20 epochs) and conjugate-gradient afterwards.
As an illustration of the conjugate-gradient method applied to a regression task,
we used the foetal weight dataset analysed in section 5.5.2, using the same number
of epochs and set sizes. Although the algorithm got stuck in local minima in some
runs we did obtain, in general, better solutions than with back-propagation. Figure
5.39 shows the result for one of the best runs, with RMS errors of 273 g for the
training set, 267 g for the verification set and 287.1 g for the test set.
Notice particularly, in Figure 5.39, a better adjustment in the high foetal weight
section.
The foetal weight estimation problem can also be approached as a classification
task (see ~xercis~ 5.22).
J
~ise 1 Case 101 Case 201 Case 301 Case 401
Case 51 Case 151 Case 251 Case 351
Figure 5.39. Predicted foetal weight (PR-FW) using an MLP3:6: 1 trained with the
conjugate-gradient algorithm. The FW curve represents the true foetal weight
values.
5.7.2 The Levenberg-Marquardt Method
The Levenberg-Marquardt method is also a fast training method, especially
designed for a sum-squared error formula as in (5-2a), and single output networks.