Page 201 -
P. 201

5.6  Performance of Neural Networks   189













                                   Figure  5.30.  Gradient descent in a parabolic error energy. The arrows show the
                                   progress of gradient descent starting from the point (-1, -I) and using two different
                                   learning factors: (a) 77 = 0.15; (b) 77 = 0.45.




                                   5.6.3 Bias and Variance in NN Design

                                   When describing the performance of Bayesian classifiers in section 4.5, a trade-off
                                   between bias and variance of  the error estimates was presented. We saw, namely,
                                   that whereas training set error estimates had, on average, zero variance and a bias
                                   inversely  dependent  on  the  number  of  training  samples, test  set error  estimates
                                   were, on average, unbiased but had a variance inversely dependent on the number
                                   of  test samples. As a matter of fact, the training and generalization properties of a
                                   neural network also exhibit a bias-variance trade-off. This can be  understood by
                                   decomposing the error energy in  bias and  variance components, as proposed by
                                   Geman et al. (1992). We will describe next the main aspects of this issue.
                                     Let  us  consider  the  average squared error  of  a neural  network  attempting to
                                   adjust its  outputs zk(xi) to  the  target  values fk(xi) for  the input patterns xi. This,
                                   taking into account formulas (5-2a) and (5-29d), is given by:






                                     If we let the size of the training set grow to infinite, this formula transforms into
                                   the integral (see also (5-3  1)):






                                   with simplified writing of the functions zk(x), I~x).
                                      Adding and subtracting Elfk I  X] to the expression inside parenthesis, the integral
                                    can be decomposed into a sum of two terms, as follows:
   196   197   198   199   200   201   202   203   204   205   206