Page 199 -
P. 199

5.6 Performance of Neural Networks   187


                                     Next, we proceed to compute a second order approximation of the error energy
                                   function at a point w*, using Taylor series expansion around that point:






                                     At  a minimum of  E the gradient is zero, therefore the linear term vanishes and
                                   the approximation becomes:





                                     This approximation has a local gradient given by:






                                     Let us now expand w - w* as a linear combination of the eigenvectors, which, as
                                   seen in section 2.3, form an orthonormal basis:






                                     Since the eigenvectors are orthonormal, when multiplying both terms of (5-38)
                                   by u:  we obtain:




                                   which  allows us  to  interpret the  a, as  the distance to the minimum w* along the
                                   direction ui.
                                     From (5-38) it is also easy to compute the weight updating as:






                                      On the other hand, substituting (5-35) and (5-38) in (5-37a), we obtain:






                                      From the gradient descent formula (5-7) we know that
   194   195   196   197   198   199   200   201   202   203   204