Page 208 -
P. 208
196 5 Neural Networks
closeness the true error rate, with an increasing number of patterns. In this formula
we may use for dvc the lower bound of formula (5-52).
An upper bound on the sufficient number of patterns for network generalization,
is also presented in the same work (Blumer et al., 1989):
Another, tighter lower bound formula, due to Ehrenfeucht et al. (1989) for the
special case of Pe I 0.125 and aI 0.01, converges asymptotically to the above
upper bound, except for a factor of the order of log2(1/Pe):
The computation of formula (5-54) upper bound presupposes the knowledge of
the true VC dimension or at least an upper bound of it. Baum and Haussler (1989)
derived an upper bound for the VC dimension of an MLP with u units (hard-limiter
neurons) and w weights (including biases), as:
where e is the base of the natural logarithms.
With this result they derived also an upper bound for the sufficient number of
patterns achieving generalization for an MLP with Fed (design set error estimate).
They proved that such an MLP will have a test set error not worse then 2 Fed with
high probability 1-a, using:
with
The value of a is rather low (a<0.005) even for low values of d and h. A rule of
thumb, based on this upper bound, estimates the number of patterns needed for
sufficient generalization as wlPe for complex MLP (high w). This means that for
e.g. Pe = 0.1, one would need about 10 times more patterns than weights.
With the program PR Size (see Appendix B) it is possible to compute the
bounds expressed by the formulas (5-53) and (5-%a), for any values of d and h.
Figure 5.34 shows these bounds for d=2, 4 and 8. The bounds were computed
using Pe = Fed = 0.05 and a = 0.0 1.