Page 218 -
P. 218

206     5 Neural Networks

                                  In  order  to  understand  the main  aspects of  the method, explained  in  detail in
                               Bishop  (1995),  let  us  rewrite  (5-2a)  as  a  squared  Euclidian  norm  of  an  n-
                               dimensional error vector:






                                  The  error  vector  e  depends  on  the  weights,  and  for  small  deviations  of  the
                               weights, during the training process, w'""  - w"),  it can be approximated by a first
                               order Taylor series as:





                               where Z is the matrix of the error derivatives:






                                  Substituting (5-64) in (5-63) we get the error approximation:






                                  Minimizing the error with respect to the new weights, yields:




                                  The term (z'z)-'z'  is a pseudo-inverse matrix, as in (5-3), and can be computed
                               using the Hessian matrix. This pseudo-inverse matrix governs the step size of  the
                               iterative learning process. In order to keep the deviation of the weights sufficiently
                               small so that  the Taylor  series approximation is valid,  the Levenberg-Marquardt
                               algorithm uses instead of (5-67) a modified error formula:





                                  The step size is now governed by  (zfz+hl)-'z'. Far away from a minimum of
                                   we  will  need  a  large  learning  step  size,  therefore  a  high  A, while  still
                                maintaining  a  small  deviation  of  the  weights.  In  the  Levenberg-Marquardt
                                algorithm, an appropriate value of /1 is chosen during the training process in order
                                to  maintain  appropriate  learning  steps  with  small  deviations  of  the  weights,
                                therefore assuring the linear approximation in (5-68).
   213   214   215   216   217   218   219   220   221   222   223