Page 208 -
P. 208

196      5 Neural Networks

                             closeness the true error rate, with an increasing number of patterns. In this formula
                             we may use for dvc the lower bound of formula (5-52).
                               An  upper bound on the sufficient number of patterns for network generalization,
                             is also presented in the same work (Blumer et al., 1989):






                               Another, tighter lower bound  formula, due to Ehrenfeucht et al. (1989) for the
                             special case of  Pe I 0.125 and  aI 0.01,  converges asymptotically to the above
                             upper bound, except for a factor of the order of log2(1/Pe):






                               The computation of formula (5-54) upper bound presupposes the knowledge of
                             the true VC dimension or at least an upper bound of it. Baum and Haussler (1989)
                             derived an upper bound for the VC dimension of an MLP with u units (hard-limiter
                             neurons) and w weights (including biases), as:




                             where e is the base of the natural logarithms.
                               With this result they derived also an  upper bound for the sufficient number of
                             patterns achieving generalization for an MLP with  Fed (design set error estimate).
                             They proved that such an MLP will have a test set error not worse then 2 Fed with
                             high probability 1-a, using:






                             with





                               The value of a is rather low (a<0.005) even for low values of d and h. A rule of
                             thumb, based  on  this  upper bound, estimates the number of  patterns needed for
                             sufficient generalization as wlPe for complex MLP (high w). This means that for
                             e.g. Pe = 0.1, one would need about 10 times more patterns than weights.
                               With  the  program  PR  Size (see  Appendix  B)  it  is  possible  to  compute the
                             bounds expressed by  the formulas (5-53) and (5-%a),  for any  values of d and h.
                             Figure  5.34  shows  these bounds for  d=2, 4  and  8. The  bounds  were computed
                             using Pe = Fed = 0.05 and a = 0.0 1.
   203   204   205   206   207   208   209   210   211   212   213