Page 201 -
P. 201
5.6 Performance of Neural Networks 189
Figure 5.30. Gradient descent in a parabolic error energy. The arrows show the
progress of gradient descent starting from the point (-1, -I) and using two different
learning factors: (a) 77 = 0.15; (b) 77 = 0.45.
5.6.3 Bias and Variance in NN Design
When describing the performance of Bayesian classifiers in section 4.5, a trade-off
between bias and variance of the error estimates was presented. We saw, namely,
that whereas training set error estimates had, on average, zero variance and a bias
inversely dependent on the number of training samples, test set error estimates
were, on average, unbiased but had a variance inversely dependent on the number
of test samples. As a matter of fact, the training and generalization properties of a
neural network also exhibit a bias-variance trade-off. This can be understood by
decomposing the error energy in bias and variance components, as proposed by
Geman et al. (1992). We will describe next the main aspects of this issue.
Let us consider the average squared error of a neural network attempting to
adjust its outputs zk(xi) to the target values fk(xi) for the input patterns xi. This,
taking into account formulas (5-2a) and (5-29d), is given by:
If we let the size of the training set grow to infinite, this formula transforms into
the integral (see also (5-3 1)):
with simplified writing of the functions zk(x), I~x).
Adding and subtracting Elfk I X] to the expression inside parenthesis, the integral
can be decomposed into a sum of two terms, as follows: