Page 213 -
P. 213

5.6 Performance of Neural Networks   201


                                 have a guaranteed  risk by  summing up  an empirical risk and a structural  risk as
                                 depicted in Figure 5.36.

















                                  Figure  5.36.  Guaranteed  risk  of  a  neural  network  as  a  sum  of  empirical  and
                                  structural risks.



                                    For  generalization  purposes,  we  are  interested  in  applying  a  principle  of
                                  structural  risk  minimization,  SRM.  An  experimental  way  of  minimizing  the
                                  structural risk consists of defining a sequence of networks with increasing dye, by
                                  addition  of  more  hidden  neurons.  For  each  network  the  empirical  risk  is
                                  minimized,  and  one  progresses  to  a  more  complex  machine  until  reaching  a
                                  minimum of the guaranteed risk.


                                  5.7  Approximation Methods in NN Training


                                  In  section 5.5 we saw  how  to train MLPs using  the back-propagation  algorithm,
                                  based  on  a  gradient  descent  technique.  Some  pitfalls  of  this  technique  were
                                  explained in that section and in more detail in section 5.6.2, when we analysed the
                                  influence  of  the  Hessian  matrix  on  the  learning  process.  There  are  several
                                  alternative algorithms for training MLPs that either attempt to improve the gradient
                                  descent  technique  used  in  the  back-propagation  algorithm,  or use  a  completely
                                  different  approach,  not  based  on the gradient descent method. This last class  of
                                  algorithms  uses  ideas  and  techniques  imported  from  the  respectable  body  of
                                  methodologies of multivariate function optimisation. The reader can find a detailed
                                  explanation of these techniques in (Fletcher,  1987) and their application to MLPs
                                  in  (Bishop,  1995).  In  this  section  we  will  present  only  two  of  these  methods,
                                  which are very fast in terms of convergence and do not require the specification of
                                  parameters (learning and momentum factor) as in the back-propagation algorithm.
   208   209   210   211   212   213   214   215   216   217   218