Page 199 -
P. 199
5.6 Performance of Neural Networks 187
Next, we proceed to compute a second order approximation of the error energy
function at a point w*, using Taylor series expansion around that point:
At a minimum of E the gradient is zero, therefore the linear term vanishes and
the approximation becomes:
This approximation has a local gradient given by:
Let us now expand w - w* as a linear combination of the eigenvectors, which, as
seen in section 2.3, form an orthonormal basis:
Since the eigenvectors are orthonormal, when multiplying both terms of (5-38)
by u: we obtain:
which allows us to interpret the a, as the distance to the minimum w* along the
direction ui.
From (5-38) it is also easy to compute the weight updating as:
On the other hand, substituting (5-35) and (5-38) in (5-37a), we obtain:
From the gradient descent formula (5-7) we know that