Page 235 - Numerical Methods for Chemical Engineering
P. 235
224 5 Numerical optimization
To make things general, let us use not the exact Hessian, but some real-symmetric
approximation to it:
B [k] ≈ H x [k] B [k] T = B [k] (5.31)
The search direction is obtained by solving
p
B [k] [k] =−γ [k] (5.32)
[k]
For small a , we approximate the change in F(x)as
[k] [k] [k] [k] [k] [k] [k] [k] 2
F x + a p − F x ≈ a p · γ + O a (5.33)
But, from our update rule γ [k] =−B [k] [k]
p ; therefore,
[k] [k] [k] [k] [k] [k] [k] [k] [k] 2
F x + a p − F x ≈−a p · B p + O a (5.34)
We first try a [k] = 1, and iteratively halve its value until we find that the descent criteria
are satisfied. Thus, a [k] eventually will be small enough for (5.34) to be valid, so that F(x)
will reduced if p [k] · B [k] [k] > 0, i.e., if B [k] is positive-definite (all of its eigenvalues are
p
positive).
Let x min be a local minimum such that no neighboring points have a lower cost function.
Then, as for very small p,
1 T 2 T
F(x min + p) − F(x min ) ≈ p H(x min )p + O[|p| ] ⇒ p H(x min )p ≥ 0 (5.35)
2
It is possible for H(x min ) to have one or more zero eigenvalues and for x min still to be
a local minimum, but there can be no negative eigenvalues. Thus, at (and very nearby) a
minimum, the Hessian is at least positive-semidefinite. Away from the near vicinity of a
local minimum, however, it is quite possible for the Hessian to have negative eigenvalues,
[k]
[k]
so that the search direction generated by H(x )p [k] =−γ(x )is not a descent direction.
[k]
It is not necessary, however, that we use the exact Hessian when computing p .As
γ(x min ) = 0, B [k] [k] =−γ(x min ) has for any nonsingular B [k] the unique solution p [k] = 0,
p
[k]
so that x [k+1] = x min . We can get the same eventual solution even if B [k] = H . Thus, if
B [k] is (nearly) singular, we could add a value τ> 0 to each diagonal element, such that
by Gershgorin’s theorem (B [k] + τ I) is positive-definite. Then, we solve (B [k] + τ I)p [k] =
−γ [k] to generate a search direction that is guaranteed to point downhill.
Often we construct B [k] from past gradient evaluations. As the Hessian is the Jacobian
of the gradient, we generate B [k+1] from B [k] so that it satisfies the secant condition
B [k+1] x = γ x = x [k+1] − x [k] γ = γ [k+1] − γ [k] (5.36)
or, a corresponding condition for the Hessian inverse
[k+1] −1
B γ = x (5.37)
The smallest update consistent with (5.37) is achieved by the Broyden–Fletcher–Goldfarb–
Shanno (BFGS) formula, which ensures that if B [0] is positive-definite (e.g., B [0] = I), so