Page 199 -

P. 199

5.6 Performance of Neural Networks 187

Next, we proceed to compute a second order approximation of the error energy
function at a point w*, using Taylor series expansion around that point:

At a minimum of E the gradient is zero, therefore the linear term vanishes and
the approximation becomes:

This approximation has a local gradient given by:

Let us now expand w - w* as a linear combination of the eigenvectors, which, as
seen in section 2.3, form an orthonormal basis:

Since the eigenvectors are orthonormal, when multiplying both terms of (5-38)
by u: we obtain:

which allows us to interpret the a, as the distance to the minimum w* along the
direction ui.
From (5-38) it is also easy to compute the weight updating as:

On the other hand, substituting (5-35) and (5-38) in (5-37a), we obtain:

From the gradient descent formula (5-7) we know that

194 195 196 197 198 199 200 201 202 203 204