Page 202 -
P. 202
190 5 Neural Networks
We already found the term (5-46a) in (5-33): it represents the conditional
regression of the target data. The second term in (5-46) reflects, therefore, the
variance in the target data and is totally independent of the network output zk. It is
the first term in (5-46) that is really interesting. The integrand is:
The optimum output of the network corresponds, of course, to E[tk I x].
Imagine now that we had many training sets of size n available, and wished to
see how the error term, dependent on the network, is influenced by the particular
choice of training set. For this purpose let us consider the ensemble average of
(5-47), ED, computed in a potentially infinite number of training sets:
The somewhat intricate computation of this ensemble average can be found in
Bishop (1995) or Haytkin (1999), where it is shown that it can be expressed as:
- The first term represents the squared average deviation of the network outputs zk
from the optimum solution E[tk I XI. It is therefore called the bias component of
the error.
- The second term represents the average squared deviation of the output values
from their ensemble average ED(zk) It is therefore called the variance
component of the error.
Imagine that we had designed a neural network to regress target values given by
the addition of function values z(xJ plus a random error term e(xi), in a similar way
as in (5-30):