Page 217 -
P. 217

5.7 Approximation Methods in NN Training   205


                                class C. Although the error figures for the training, verification and test sets were
                                quite similar, we  must  take into account that  formula (5-56) for complex MLPs
                                indicates  a  number  of  patterns  needed  for  sufficient  generalization  of
                                approximately wlPe,  which  in  this  case  corresponds  to  3392 patterns,  a  higher
                                number of patterns than we have available for training. The previous error figures
                                may be, therefore, somewhat optimistic.
                                  When  using the conjugate gradient method in high dimensional feature spaces,
                                it is often found that the algorithm gets stuck in local minima in the early steps of
                                the process.  It may be helpful in  such cases to use back-propagation in  the early
                                steps (e.g. 20 epochs) and conjugate-gradient afterwards.
                                  As  an illustration of the conjugate-gradient method applied to a regression task,
                                we used the foetal weight dataset analysed in section 5.5.2, using the same number
                                of epochs and set sizes. Although the algorithm got stuck in local minima in some
                                runs we did obtain, in general, better solutions than with back-propagation. Figure
                                5.39 shows the result for one of  the best runs, with  RMS errors of 273 g for the
                                training set, 267 g for the verification set and 287.1 g for the test set.
                                  Notice particularly, in Figure 5.39, a better adjustment in the high foetal weight
                                section.
                                  The foetal weight estimation problem can also be approached as a classification
                                task (see ~xercis~ 5.22).




















                                                                                         J
                                         ~ise 1     Case 101   Case 201   Case 301    Case 401
                                               Case 51    Case 151   Case 251   Case 351
                                Figure 5.39.  Predicted foetal weight (PR-FW)  using an MLP3:6: 1 trained with the
                                conjugate-gradient  algorithm.  The  FW  curve  represents  the  true  foetal  weight
                                values.



                                5.7.2 The Levenberg-Marquardt Method

                                The  Levenberg-Marquardt  method  is  also  a  fast  training  method,  especially
                                designed for a sum-squared error formula as in (5-2a), and single output networks.
   212   213   214   215   216   217   218   219   220   221   222