Page 210 - Compact Numerical Methods For Computers
P. 210

Descent to a minimum II: conjugate gradients        199

                       Since the conjugate gradients methods are derived on the presumption that
                     they minimise a quadratic form in n steps, it is also necessary to suggest a method
                     for continuing the iterations after n steps. Some authors, for instance Polak and
                      Ribiere (1969), continue iterating with the same recurrence formula. However,
                     while the iteration matrix B in the variable metric algorithms can in a number of
                                                                        -1
                     situations be shown to tend towards the inverse Hessian H  in some sense, there
                     do not seem to be any similar theorems for conjugate gradients algorithms.
                      Fletcher and Reeves (1964) restart their method every (n+1) steps with

                                                      t = -g 1                         (16.15)
                                                       1
                     while Fletcher (1972) does this every n iterations. Powell (1975a, b) has some
                      much more sophisticated procedures for restarting his conjugate gradients
                      method. I have chosen to restart every n steps or whenever the linear search can
                      make no progress along the search direction. If no progress can be made in the
                      first conjugate gradient direction--that of steepest descent-then the algorithm is
                      taken to have converged.
                        The linear search used in the first edition of this book was that of §13.2. However,
                      this proved to work well in some computing environments but poorly in others. The
                      present code uses a simpler search which first finds an ‘acceptable point’ by stepsize
                      reduction, using the same ideas as discussed in §15.3. Once an acceptable point has
                      been found, we have sufficient information to fit a parabola to the projection of the
                      function on the search direction. The parabola requires three pieces of information.
                      These are the function value at the end of the last iteration (or the initial point), the
                      projection of the gradient at this point onto the search direction, and the new (lower)
                      function value at the acceptable point. The step length resulting from the quadratic
                      inverse interpolation is used to generate a new trial point for the function. If this
                      proves to give a point lower than the latest acceptable point, it becomes the starting
                      point for the next iteration. Otherwise we use the latest acceptable point, which is the
                      lowest point so far.
                        A starting step length is needed for the search. In the Newton and variable metric
                      (or quasi-Newton) methods, we can use a unit step length which is ideal for the
                      minimisation of a quadratic function. However, for conjugate gradients, we do not
                      have this theoretical support. The strategy used here is to multiply the best step
                      length found in the line search by some factor to increase the step. Our own usual
                      choice is a factor 1.7. At the start of the conjugate gradients major cycle we set the
                      step length to 1. If the step length exceeds 1 after being increased at the end of each
                      iteration, it is reset to 1.
                        If the choice of linear search is troublesome, that of a recurrence formula is
                      even more difficult. In some tests by the author on the 23-bit binary NOVA, the
                      Beale-Sorenson formula (16.8) in conjunction with the linear search chosen above
                      required more function and derivative evaluations than either formula (16.10) or
                      formula (16.11). A more complete comparison of the Polak-Ribiere formula
                      (16.10) with that of Fletcher-Reeves (16.11) favoured the former. However, it is
                      worth recording Fletcher’s (1972) comment: ‘I know of no systematic evidence
                      which indicates how the choice should be resolved in a general-purpose al-
                      gorithm.’ In the current algorithm, the user is given a choice of which approach
                      should be used.
   205   206   207   208   209   210   211   212   213   214   215