Page 238 - Numerical Methods for Chemical Engineering
P. 238
Newton methods for large problems 227
decreases monotonically,
d 2 d [k]
|p(s)| > 0 m (p(s)) < 0 (5.47)
ds ds
As we are considering the case where p (n) lies outside of the trust region and p (d) lies within
it, the minimum along the curve occurs at 1 ≤ s c ≤ 2 where p(s) crosses the trust region
boundary,
2 (n) 2 [k] 2
(c)
|p(s c )| = p + (s c − 1) p − p (c) = (5.48)
In this case, the accepted new estimate is
x [k+1] = x [k] + x [k] x [k] = p(s c ) = p (c) + (s c − 1) p (n) − p (c) (5.49)
The dogleg method allows us to identify quickly a point within the trust region that lowers
the model cost function at least as much as the Cauchy point. The advantage over the
Newton line search procedure is that the full Newton step is not automatically accepted as
the search direction, avoiding the problems inherent in its erratic size and direction when
B [k] is nearly singular. Further methods to find approximate trust-region minimums are
discussed in Nocedal & Wright (1999).
Newton methods for large problems
In the line search and trust-region Newton methods, we must obtain the full Newton step
p (n) by solving a linear system at each Newton iteration,
B [k] p (n) =−γ [k] B [k] ≈ H x [k] B [k] > 0 (5.50)
As B [k] is real-symmetric/positive-definite, the conjugate gradient method can be used. As
discussed in Chapter 1, fill-in during elimination for large, sparse systems yields very long
computation times and massive memory requirements; however, in the conjugate gradient
method, no elimination is necessary (see algorithm above). We only need to compute at
(n)
each conjugate gradient iteration the product of B [k] with the current estimate of p . When
B [k] is sparse, this product is fast to compute and we only need to store the nonzero elements
[k]
of B .
BFGS can be applied to large problems when the Hessian is sparse if the update formula
(5.38) is provided with the sparsity pattern so that only the nonzero positions are stored
and updated. Alternatively, only the most recent gradient vectors may be retained and used
in (5.38). Both approaches allow the construction of approximate Hessians with limited
memory usage. For a more detailed discussion of memory-efficient BFGS methods, consult
Nocedal & Wright (1999).
The need in Newton’s method to store B [k] and to solve a linear system (5.50) at each
iteration poses a significant challenge for large optimization problems. For large problems
with Hessians that are dense or whose sparsity patterns are unknown, the tricks above
cannot be used. Instead, the nonlinear conjugate gradient method, which does not require
any curvature knowledge, is recommended.