Page 238 - Numerical Methods for Chemical Engineering

P. 238

Newton methods for large problems 227

decreases monotonically,
d 2 d [k]
|p(s)| > 0 m (p(s)) < 0 (5.47)
ds ds
As we are considering the case where p (n) lies outside of the trust region and p (d) lies within
it, the minimum along the curve occurs at 1 ≤ s c ≤ 2 where p(s) crosses the trust region
boundary,
2 (n) 2 [k] 2
(c)

|p(s c )| = p + (s c − 1) p − p (c) = (5.48)
In this case, the accepted new estimate is

x [k+1] = x [k] + x [k] x [k] = p(s c ) = p (c) + (s c − 1) p (n) − p (c) (5.49)
The dogleg method allows us to identify quickly a point within the trust region that lowers
the model cost function at least as much as the Cauchy point. The advantage over the
Newton line search procedure is that the full Newton step is not automatically accepted as
the search direction, avoiding the problems inherent in its erratic size and direction when
B [k] is nearly singular. Further methods to ﬁnd approximate trust-region minimums are
discussed in Nocedal & Wright (1999).

Newton methods for large problems

In the line search and trust-region Newton methods, we must obtain the full Newton step
p (n) by solving a linear system at each Newton iteration,

B [k] p (n) =−γ [k] B [k] ≈ H x [k] B [k] > 0 (5.50)
As B [k] is real-symmetric/positive-deﬁnite, the conjugate gradient method can be used. As
discussed in Chapter 1, ﬁll-in during elimination for large, sparse systems yields very long
computation times and massive memory requirements; however, in the conjugate gradient
method, no elimination is necessary (see algorithm above). We only need to compute at
(n)
each conjugate gradient iteration the product of B [k] with the current estimate of p . When
B [k] is sparse, this product is fast to compute and we only need to store the nonzero elements
[k]
of B .
BFGS can be applied to large problems when the Hessian is sparse if the update formula
(5.38) is provided with the sparsity pattern so that only the nonzero positions are stored
and updated. Alternatively, only the most recent gradient vectors may be retained and used
in (5.38). Both approaches allow the construction of approximate Hessians with limited
memory usage. For a more detailed discussion of memory-efﬁcient BFGS methods, consult
Nocedal & Wright (1999).
The need in Newton’s method to store B [k] and to solve a linear system (5.50) at each
iteration poses a signiﬁcant challenge for large optimization problems. For large problems
with Hessians that are dense or whose sparsity patterns are unknown, the tricks above
cannot be used. Instead, the nonlinear conjugate gradient method, which does not require
any curvature knowledge, is recommended.

233 234 235 236 237 238 239 240 241 242 243