Page 210 - Compact Numerical Methods For Computers
P. 210
Descent to a minimum II: conjugate gradients 199
Since the conjugate gradients methods are derived on the presumption that
they minimise a quadratic form in n steps, it is also necessary to suggest a method
for continuing the iterations after n steps. Some authors, for instance Polak and
Ribiere (1969), continue iterating with the same recurrence formula. However,
while the iteration matrix B in the variable metric algorithms can in a number of
-1
situations be shown to tend towards the inverse Hessian H in some sense, there
do not seem to be any similar theorems for conjugate gradients algorithms.
Fletcher and Reeves (1964) restart their method every (n+1) steps with
t = -g 1 (16.15)
1
while Fletcher (1972) does this every n iterations. Powell (1975a, b) has some
much more sophisticated procedures for restarting his conjugate gradients
method. I have chosen to restart every n steps or whenever the linear search can
make no progress along the search direction. If no progress can be made in the
first conjugate gradient direction--that of steepest descent-then the algorithm is
taken to have converged.
The linear search used in the first edition of this book was that of §13.2. However,
this proved to work well in some computing environments but poorly in others. The
present code uses a simpler search which first finds an ‘acceptable point’ by stepsize
reduction, using the same ideas as discussed in §15.3. Once an acceptable point has
been found, we have sufficient information to fit a parabola to the projection of the
function on the search direction. The parabola requires three pieces of information.
These are the function value at the end of the last iteration (or the initial point), the
projection of the gradient at this point onto the search direction, and the new (lower)
function value at the acceptable point. The step length resulting from the quadratic
inverse interpolation is used to generate a new trial point for the function. If this
proves to give a point lower than the latest acceptable point, it becomes the starting
point for the next iteration. Otherwise we use the latest acceptable point, which is the
lowest point so far.
A starting step length is needed for the search. In the Newton and variable metric
(or quasi-Newton) methods, we can use a unit step length which is ideal for the
minimisation of a quadratic function. However, for conjugate gradients, we do not
have this theoretical support. The strategy used here is to multiply the best step
length found in the line search by some factor to increase the step. Our own usual
choice is a factor 1.7. At the start of the conjugate gradients major cycle we set the
step length to 1. If the step length exceeds 1 after being increased at the end of each
iteration, it is reset to 1.
If the choice of linear search is troublesome, that of a recurrence formula is
even more difficult. In some tests by the author on the 23-bit binary NOVA, the
Beale-Sorenson formula (16.8) in conjunction with the linear search chosen above
required more function and derivative evaluations than either formula (16.10) or
formula (16.11). A more complete comparison of the Polak-Ribiere formula
(16.10) with that of Fletcher-Reeves (16.11) favoured the former. However, it is
worth recording Fletcher’s (1972) comment: ‘I know of no systematic evidence
which indicates how the choice should be resolved in a general-purpose al-
gorithm.’ In the current algorithm, the user is given a choice of which approach
should be used.