Page 214 -
P. 214

202    5 Neural Networks


                            5.7.1 The Conjugate-Gradient Method

                            The  conjugate  gradient  method  uses  a  line search  technique,  consisting  of  the
                            determination of the error minimum along a certain line, with an efficient choice of
                            the searching direction on the surface corresponding to the error energy. At  each
                            iterative step of this method there are two main tasks to be solved:

                            1. Find  the  minimum  along  the  surface  line  corresponding  to  a  certain  search
                              direction.
                            2.  Determine the next search direction as the conjugate of the previous one.

                              Let  us  illustrate  how  the  method  operates  with  the  example  of  the  quadratic
                            function (5-43). At the first iteration (r=l), we start with an arbitrary direction dl'
                            represented by the white arrow in Figure 5.37, with slope -1.2. Next, we determine
                            the  minimum  of  E  along  this  line,  which  occurs  at  w* = (-1.59,  -1.065)  (see
                            Exercise 5.20) . Let us compute the gradient at this point:








                              The gradient, represented with dotted white line in Figure 5.37, has slope 111.2,
                            i.e.,  it is orthogonal to the chosen direction at the minimum point. This is a general
                            property:  the gradient projection along the chosen direction and at the minimum
                            point is always zero. If  we would choose to move next along the gradient, as in the
                            gradient  descent  method,  we  would  actually  move  away  from  the  minimum.
                            Instead, we move along a direction satisfying the following property:

                               d(')~d('+') = 0, where H is the Hessian matrix.

                               Directions  satisfying this condition  are called conjugate directions, since they
                            do not interfere with each other. As a matter of fact, along the conjugate direction,
                            represented  by  the  black  arrow  in  Figure  5.37,  the  component  of  the  gradient
                            parallel  to  the  previous  direction  (white lines) remains  zero,  and  we  are in  fact
                            moving along a sequence of minima relative to the previous direction.
                               Next,  we  determine  again the line minimum, which  is in  this case the global
                            minimum  error,  and  the  process  terminates.  Otherwise,  the  previous  two  steps
                            would have to be repeated until convergence.
                               In  general,  for a  quadratic  error  E defined  by  a  wxw matrix  H,  there  are w
                            conjugate  directions,  which  are  orthogonal  to  each  other  only  if  H=I.  The
                            conjugate-gradient  method  guarantees  a  steadily  decreasing  error  and  finds  a
                            minimum within at most w iterations. It is, therefore, a very fast method. The only
                            problem is that once a local minimum is reached, it will not move away.
                               The algorithm for implementing the conjugate-gradient method must solve two
                            main tasks: the determination of the minimum along a line and an efficient method
   209   210   211   212   213   214   215   216   217   218   219