Page 341 - Applied Numerical Methods Using MATLAB
P. 341
330 OPTIMIZATION
7.1.5 Newton Method
Like the steepest descent method, this method also uses the gradient to search for
the minimum point of an objective function. Such gradient-based optimization
methods are supposed to reach a point at which the gradient is (close to) zero.
In this context, the optimization of an objective function f(x) is equivalent to
finding a zero of its gradient g(x), which in general is a vector-valued function
of a vector-valued independent variable x. Therefore, if we have the gradient
function g(x) of the objective function f(x), we can solve the system of nonlinear
equations g(x) = 0 to get the minimum of f(x) by using the Newton method
explained in Section 4.4.
The backgrounds of this method as well as the steepest descent method can
be shown by taking the Taylor series of, say, a two-variable objective function
f(x 1 ,x 2 ):
∂f ∂f x 1 − x 1k
∼
f(x 1 ,x 2 ) = f(x 1k ,x 2k ) +
∂x 1 ∂x 2 (x 1k ,x 2k ) x 2 − x 2k
2 2 2
∂ f/∂x
1 1 ∂ f/∂x 1 ∂x 2 x 1 − x 1k
+ x 1 − x 1k x 2 − x 2k 2 2 2
2 ∂ f/∂x 2 ∂x 1 ∂ f/∂x 2 (x 1k ,x 2k ) x 2 − x 2k
1
∼ T T 2 [x − x k ]
f(x) = f(x k ) +∇f(x) | x k [x − x k ] + [x − x k ] ∇ f(x)| x k
2
1 T
T
f(x) = f(x k ) + g [x − x k ] + [x − x k ] H k [x − x k ] (7.1.10)
∼
k
2
2
with the gradient vector g k =∇f(x)| x k and the Hessian matrix H k =∇ f(x)| x k .
In the light of this equation, we can see that the value of the objective function at
updated by the steepest descent algorithm described by
point x k+1
Eq. (7.1.9)
(7.1.9)
x k+1 = x k − α k g k /||g k ||
is most likely smaller than that at the old point x k , with the third term in
Eq. (7.1.10) neglected.
∼ T T
f(x k+1 ) = f(x k ) + g [x k+1 − x k ] = f(x k ) − α k g g k /||g k ||
k
k
(7.1.11)
T
∼
f(x k+1 ) − f(x k ) = −α k g g k /||g k || ≤ 0 ⇒ f(x k+1 ) ≤ f(x k )
k
Slightly different from this strategy of the steepest descent algorithm, the Newton
method tries to go straight to the zero of the gradient of the approximate objective
function (7.1.10)
g k + H k [x − x k ] = 0, x = x k − H −1 (7.1.12)
k g k
by the updating rule
x k+1 = x k − H −1 (7.1.13)
k g k
2
with the gradient vector g k =∇f(x)| xk and the Hessian matrix H k =∇ f(x)| xk
(Appendix C).