Page 218 -
P. 218
206 5 Neural Networks
In order to understand the main aspects of the method, explained in detail in
Bishop (1995), let us rewrite (5-2a) as a squared Euclidian norm of an n-
dimensional error vector:
The error vector e depends on the weights, and for small deviations of the
weights, during the training process, w'"" - w"), it can be approximated by a first
order Taylor series as:
where Z is the matrix of the error derivatives:
Substituting (5-64) in (5-63) we get the error approximation:
Minimizing the error with respect to the new weights, yields:
The term (z'z)-'z' is a pseudo-inverse matrix, as in (5-3), and can be computed
using the Hessian matrix. This pseudo-inverse matrix governs the step size of the
iterative learning process. In order to keep the deviation of the weights sufficiently
small so that the Taylor series approximation is valid, the Levenberg-Marquardt
algorithm uses instead of (5-67) a modified error formula:
The step size is now governed by (zfz+hl)-'z'. Far away from a minimum of
we will need a large learning step size, therefore a high A, while still
maintaining a small deviation of the weights. In the Levenberg-Marquardt
algorithm, an appropriate value of /1 is chosen during the training process in order
to maintain appropriate learning steps with small deviations of the weights,
therefore assuring the linear approximation in (5-68).