Page 211 -
P. 211
5.6 Performance of Neural Networks 199
A lower bound on the number of samples needed for learning a class of
functions mapping Xinto [0,1], with error Pe and confidence level 1-a, a<0.01, is
(Anthony and Bartlett, 1999):
fat(^, pelq)-1
n, 2 , for any 0 < q c !A,
1%
where q is the quantization level of the function values, i.e., they are computed as
qxtrunc(f(xi)lq).
If we use formula (5-57) in (5-58) we obtain the estimated lower bound:
Let us apply this lower bound to the MLP3:3:1 regressing the temperature T for
the Weather dataset, with Pe - 0.08, as analysed in section 5.5.3. The total
variation of T(x) is 1573.9 "C, which we refer to the [0,1] interval by dividing by
the max(T(x))-min(T(x)) range, obtaining 59.4. Applying formula (5-58a) the
estimated lower bound is 23, which is largely exceeded by the number of training
samples (378).
5.6.5 Risk Minimization
In the preceding section we saw how to characterize the complexity of a neural
network using the Vapnik-Chervonenkis and the fat-shattering dimensions, for
classification and regression problems, respectively. We will now proceed to see
how the complexity influences the classification or regression risk of a neural
network. For this purpose, let us denote R(w) the risk incurred when using a neural
network with weight vector w:
Notice that this is the same formula as (4-20a), section 4.2.1, now applied to a
continuous target space T. As we have done in that section we will discuss the
simplified situation, without impairing the main conclusions, of zero loss for
correct classifications and equal loss for wrong classifications. R(w) will then be,
simply, the classifier true error Pe(w). As usual, we do not know the true error but
are able to obtain the design set error estimate, Fed (w ) , by counting the number
of individual errors. An important result due to Vapnik (see Vapnik, 1998) relates
these two errors, for any weight vector w and arbitrarily small quantity E, as
follows:
with