Page 222 - Compact Numerical Methods For Computers
P. 222
Minimising a nonlinear sum of squares 211
sufficiently well behaved for the unmodified Gauss-Newton algorithm to work
successfully, the linear search introduces some waste of effort. Furthermore, when
there are large differences in scale between the elements of the Jacobian J and/or
it is difficult to evaluate these accurately, then the condition of J J may deterior-
T
ate so that positive definiteness can no longer be guaranteed computationally and
the result (17.19) is inapplicable. Nevertheless, I have found that a carefully
coded implementation of Hartley’s (1961) ideas using the Choleski decomposition
and back-solution for consistent (but possibly singular) systems of linear equations
is a very compact and efficient procedure for solving the nonlinear least-squares
problem. However, it has proved neither so simple to implement nor so generally
reliable as the method which follows and, by virtue of its modifications to handle
singular J J, no longer satisfies the conditions of the convergence result (17.19).
T
17.4. MARQUARDT’S METHOD
The problems of both scale and singularity of J J are attacked simultaneously by
T
Marquardt (1963). Consider solutions q to the equations
T
(J J+e1) q=-J f (17.20)
T
where e is some parameter. Then as e becomes very large relative to the norm of
J J, q tends towards the steepest descents direction, while when e is very small
T
compared to this norm, the Gauss-Newton solution is obtained. Furthermore, the
scaling of the parameters
(17.21)
x' =Dx
where D is a diagonal matrix having positive diagonal elements, implies a
transformed Jacobian such that
J' =JD - 1 (17.22)
and equations (17.20) become
(17.23a)
T T
[(J') J'+e1]q'=-(J') f
(17.23b)
- 1 T -1 - 1 T
=(D J JD +e1)Dq= -D J f
or
(17.24)
T 2 T
(J J+eD )q= -J f
so that the scaling may be accomplished implicitly by solving (17.24) instead of
(17.23a).
Marquardt (1963) and Levenberg (1944) have suggested the scaling choice
(17.25)
However, to circumvent failures when one of the diagonal elements of J J is zero,
T
I prefer to use
(17.26)
where f is some number chosen to ensure the scale is not too small (see Nash
1977). A value of f=1 seems satisfactory for even the most pathological
problems. The matrix J J+eD 2 is always positive definite, and by choosing e
T