Page 238 - Numerical Methods for Chemical Engineering
P. 238

Newton methods for large problems                                   227



                  decreases monotonically,
                                       d      2        d  [k]
                                         |p(s)| > 0      m  (p(s)) < 0               (5.47)
                                       ds              ds
                  As we are considering the case where p (n)  lies outside of the trust region and p  (d)  lies within
                  it, the minimum along the curve occurs at 1 ≤ s c ≤ 2 where p(s) crosses the trust region
                  boundary,
                                       2                 (n)       2     [k] 2
                                            (c)

                                 |p(s c )| = p  + (s c − 1) p  − p  (c)    =         (5.48)
                  In this case, the accepted new estimate is

                      x [k+1]  = x [k]  +  x [k]   x [k]  = p(s c ) = p (c)  + (s c − 1) p (n)  − p (c)     (5.49)
                  The dogleg method allows us to identify quickly a point within the trust region that lowers
                  the model cost function at least as much as the Cauchy point. The advantage over the
                  Newton line search procedure is that the full Newton step is not automatically accepted as
                  the search direction, avoiding the problems inherent in its erratic size and direction when
                  B [k]  is nearly singular. Further methods to find approximate trust-region minimums are
                  discussed in Nocedal & Wright (1999).



                  Newton methods for large problems


                  In the line search and trust-region Newton methods, we must obtain the full Newton step
                  p (n)  by solving a linear system at each Newton iteration,


                                  B [k]  p (n)  =−γ [k]  B [k]  ≈ H x [k]    B [k]  > 0  (5.50)
                  As B [k]  is real-symmetric/positive-definite, the conjugate gradient method can be used. As
                  discussed in Chapter 1, fill-in during elimination for large, sparse systems yields very long
                  computation times and massive memory requirements; however, in the conjugate gradient
                  method, no elimination is necessary (see algorithm above). We only need to compute at
                                                                                  (n)
                  each conjugate gradient iteration the product of B [k]  with the current estimate of p . When
                  B [k]  is sparse, this product is fast to compute and we only need to store the nonzero elements
                      [k]
                  of B .
                    BFGS can be applied to large problems when the Hessian is sparse if the update formula
                  (5.38) is provided with the sparsity pattern so that only the nonzero positions are stored
                  and updated. Alternatively, only the most recent gradient vectors may be retained and used
                  in (5.38). Both approaches allow the construction of approximate Hessians with limited
                  memory usage. For a more detailed discussion of memory-efficient BFGS methods, consult
                  Nocedal & Wright (1999).
                    The need in Newton’s method to store B [k]  and to solve a linear system (5.50) at each
                  iteration poses a significant challenge for large optimization problems. For large problems
                  with Hessians that are dense or whose sparsity patterns are unknown, the tricks above
                  cannot be used. Instead, the nonlinear conjugate gradient method, which does not require
                  any curvature knowledge, is recommended.
   233   234   235   236   237   238   239   240   241   242   243