Page 201 - Compact Numerical Methods For Computers
P. 201

190               Compact numerical methods for computers
                           Newton method based on such a matrix would still require the solution of a linear
                           system of equations at each step. The variable metric methods generate an
                           approximate inverse Hessian as they proceed, requiring only one evaluation of the
                           first derivatives per step and, moreover, reduce the function value at each of these
                           steps.

                                              15.3. A CHOICE OF STRATEGIES
                           In order to specify a particular variable metric algorithm it is now necessary to
                           choose a matrix-updating formula and a linear search procedure. One set of
                           choices resulting in a compact algorithm is that of Fletcher (1970). Here this will
                           be simplified somewhat and some specific details made clear. First, Fletcher
                           attempts to avoid an inverse interpolation in performing the linear search. He
                           suggests an ‘acceptable point’ search procedure which takes the first point in some
                           generated sequence which satisfies some acceptance criterion. Suppose that the
                           step taken is
                                                                                             (15.29)
                                                        t = b ' -b = -k Bg
                           with k=1 initially. The decrease in the function
                                                                                             (15.30)
                           will be approximated for small steps t by the first term in the Taylor series for S
                                                   T
                           along t from b, that is, by t g. This is negative when t is a ‘downhill’ direction. It
                                                                                 T
                           is not desirable that DS be very different in magnitude from t g since this would
                           imply that the local approximation of the function by a quadratic form is grossly
                                                       2
                           in error. By choosing k=1, w, w , . . . , for 0<w<1 successively, it is always
                           possible to produce a t such that
                                                            T
                                            0<tolerance< DS/t g      for tolerance << 1      (15.31)
                           unless the minimum has been found. This presumes
                                                             T
                                                            t g< 0                           (15.32)
                           to ensure a ‘downhill’ direction. In any practical program a test of the condition
                           (15.32) is advisable before proceeding with any search along t. Fletcher recom-
                           mends the values w=0·1, tolerance=0·0001. However, if the point is not
                           acceptable, he uses a cubic inverse interpolation to reduce k, with the proviso that
                           0·1k be the smallest value permitted to generate the next step in the search.
                           The author retains tolerance=0·0001, but uses w=0·2 with no interpolation
                           procedure. For a study of acceptable point-interpolation procedures in minimisa-
                           tion algorithms see Sargent and Sebastian (1972).
                             An updating formula which has received several favourable reports is that of
                           Broyden (1970a, b), Fletcher (1970) and Shanno (1970). This employs the update

                                                        T       T       T
                                                  C=d tt - [t(By) +(By)t ] /d  1             (15.33)
                                                      2
                           where y is the gradient difference
                                                         y=g(b' ) -g(b)                      (15.34)
   196   197   198   199   200   201   202   203   204   205   206