Page 200 - Compact Numerical Methods For Computers
P. 200

Descent to a minimum I: variable metric algorithms    189

                      is given by
                                               y = g j + l  - g =H(b  j+ l  -b )
                                                j
                                                        j
                                                                   j
                                                          =k Ht j                      (15.18)
                                                            j
                      since the elements of H are constant in this case. From this it follows that (15.15)
                      becomes
                                                     B (m )  y =k t .                  (15.19)
                                                             j j
                                                          j
                      Assuming (15.15) is correct for j<m, a new step
                                                         (m )
                                                     t =B   g                          (15.20)
                                                      m      m
                      is required to be conjugate to all previous directions t , i.e.
                                                                    j
                                                             for  j<m.                 (15.21)
                      But from (15.18), (15.19) and (15.20) we obtain
                                                                                       (15.22)

                        Suppose now that the linear searches at each step have been performed
                      accurately. Then the directions
                                                    t , t , . . . , t  m- l            (15.23)
                                                      2
                                                    1
                      define a hyperplane on which the quadratic form has been minimised. Thus g m  is
                      orthogonal to t ,  j<m, and (15.21) is satisfied, so that the new direction t m  is
                                   j
                      conjugate to the previous ones.
                        In order that B (m +l)  satisfies the above theory so that the process can continue,
                      the update C in
                                                   B (m + 1 ) =B  ( m ) +C             (15.24)

                      must satisfy (15.19), that is
                                                    (m + l )
                                                   B     y =k t                        (15.25)
                                                              m m
                                                          m
                      or
                                                 Cy =k t -B  ( m ) y  m                (15.26)
                                                    m
                                                        m m
                      and
                                            B (m + l ) y =k t  f o r  j<m              (15.27)
                                                      j j
                                                   j
                      or
                                                           (m )
                                                 Cy =k t -B   y
                                                    j  j j     j
                                                     = k t -k t =0.                    (15.28)
                                                            j j
                                                        j j
                                                                                          ( m )
                        In establishing (15.26) and (15.28), equation (15.19) has been applied for B  .
                      There are thus m conditions on the order-n matrix C. This degree of choice in C
                      has in part been responsible for the large literature on variable metric methods.
                        The essence of the variable metric methods, i.e. that information regarding
                      the Hessian has been drawn from first derivative computations only is somewhat
                      hidden in the above development. Of course, differences could have been used to
                      generate an approximate Hessian from (n+1) vectors of first derivatives, but a
   195   196   197   198   199   200   201   202   203   204   205