Page 235 - Numerical Methods for Chemical Engineering
P. 235

224     5 Numerical optimization



                     To make things general, let us use not the exact Hessian, but some real-symmetric
                   approximation to it:


                                         B [k]  ≈ H x [k]       B [k] T  = B [k]      (5.31)
                   The search direction is obtained by solving

                                                   p
                                                B [k] [k]  =−γ [k]                    (5.32)
                            [k]
                   For small a , we approximate the change in F(x)as
                               [k]  [k] [k]       [k]    [k]     [k]  [k]        [k] 2

                           F x   + a  p   − F x   ≈ a   p  · γ   + O a                (5.33)
                   But, from our update rule γ [k]  =−B [k] [k]
                                                  p ; therefore,
                             [k]  [k] [k]       [k]     [k]     [k]  [k]  [k]         [k] 2

                         F x  + a  p   − F x    ≈−a    p  · B  p   + O a              (5.34)
                   We first try a [k]  = 1, and iteratively halve its value until we find that the descent criteria
                   are satisfied. Thus, a [k]  eventually will be small enough for (5.34) to be valid, so that F(x)
                   will reduced if p [k]  · B [k] [k]  > 0, i.e., if B [k]  is positive-definite (all of its eigenvalues are
                                       p
                   positive).
                     Let x min be a local minimum such that no neighboring points have a lower cost function.
                   Then, as for very small p,
                                           1  T               2     T
                      F(x min + p) − F(x min ) ≈  p H(x min )p + O[|p| ] ⇒ p H(x min )p ≥ 0  (5.35)
                                           2
                   It is possible for H(x min ) to have one or more zero eigenvalues and for x min still to be
                   a local minimum, but there can be no negative eigenvalues. Thus, at (and very nearby) a
                   minimum, the Hessian is at least positive-semidefinite. Away from the near vicinity of a
                   local minimum, however, it is quite possible for the Hessian to have negative eigenvalues,
                                                                   [k]
                                                       [k]
                   so that the search direction generated by H(x )p [k]  =−γ(x )is not a descent direction.
                                                                                     [k]
                     It is not necessary, however, that we use the exact Hessian when computing p .As
                   γ(x min ) = 0, B [k] [k]  =−γ(x min ) has for any nonsingular B [k]  the unique solution p [k]  = 0,
                                 p
                                                                                 [k]
                   so that x [k+1]  = x min . We can get the same eventual solution even if B [k]   = H . Thus, if
                   B [k]  is (nearly) singular, we could add a value τ> 0 to each diagonal element, such that
                   by Gershgorin’s theorem (B [k]  + τ I) is positive-definite. Then, we solve (B [k]  + τ I)p [k]  =
                   −γ [k]  to generate a search direction that is guaranteed to point downhill.
                     Often we construct B [k]  from past gradient evaluations. As the Hessian is the Jacobian
                   of the gradient, we generate B [k+1]  from B [k]  so that it satisfies the secant condition

                         B [k+1]   x =  γ   x = x [k+1]  − x [k]   γ = γ [k+1]  − γ [k]  (5.36)
                   or, a corresponding condition for the Hessian inverse

                                                 [k+1] −1

                                               B        γ =  x                        (5.37)
                   The smallest update consistent with (5.37) is achieved by the Broyden–Fletcher–Goldfarb–
                   Shanno (BFGS) formula, which ensures that if B [0]  is positive-definite (e.g., B [0]  = I), so
   230   231   232   233   234   235   236   237   238   239   240