Page 214 -
P. 214
202 5 Neural Networks
5.7.1 The Conjugate-Gradient Method
The conjugate gradient method uses a line search technique, consisting of the
determination of the error minimum along a certain line, with an efficient choice of
the searching direction on the surface corresponding to the error energy. At each
iterative step of this method there are two main tasks to be solved:
1. Find the minimum along the surface line corresponding to a certain search
direction.
2. Determine the next search direction as the conjugate of the previous one.
Let us illustrate how the method operates with the example of the quadratic
function (5-43). At the first iteration (r=l), we start with an arbitrary direction dl'
represented by the white arrow in Figure 5.37, with slope -1.2. Next, we determine
the minimum of E along this line, which occurs at w* = (-1.59, -1.065) (see
Exercise 5.20) . Let us compute the gradient at this point:
The gradient, represented with dotted white line in Figure 5.37, has slope 111.2,
i.e., it is orthogonal to the chosen direction at the minimum point. This is a general
property: the gradient projection along the chosen direction and at the minimum
point is always zero. If we would choose to move next along the gradient, as in the
gradient descent method, we would actually move away from the minimum.
Instead, we move along a direction satisfying the following property:
d(')~d('+') = 0, where H is the Hessian matrix.
Directions satisfying this condition are called conjugate directions, since they
do not interfere with each other. As a matter of fact, along the conjugate direction,
represented by the black arrow in Figure 5.37, the component of the gradient
parallel to the previous direction (white lines) remains zero, and we are in fact
moving along a sequence of minima relative to the previous direction.
Next, we determine again the line minimum, which is in this case the global
minimum error, and the process terminates. Otherwise, the previous two steps
would have to be repeated until convergence.
In general, for a quadratic error E defined by a wxw matrix H, there are w
conjugate directions, which are orthogonal to each other only if H=I. The
conjugate-gradient method guarantees a steadily decreasing error and finds a
minimum within at most w iterations. It is, therefore, a very fast method. The only
problem is that once a local minimum is reached, it will not move away.
The algorithm for implementing the conjugate-gradient method must solve two
main tasks: the determination of the minimum along a line and an efficient method