Page 207 -
P. 207

5.6 Performance of Neural Networks   195


                               dichotomies reaches the value 2" for a maximum of n=d+l  patterns in accordance
                               with formula (5-5 1). Thus, for this case, dyc =d+l.
                                 Below  dyc the  neural  net  learns  the  training  set  without  error  and  does  not
                               generalize. It is only above dyc that the neural net is capable of generalization.
                                 In general, the Vapnik Chervonenkis dimension is, unfortunately, very difficult
                               to  determine. Notice that in  order to show that the VC dimension of  a MLP is at
                               least m, one must simply find some set of m  regularly distributed points shattered
                               by the MLP. However, in order to show that the VC dimension of a MLP is at most
                               m, one must show that no set of m+l regularly distributed points is shattered by the
                               MLP. Therefore, in the case of Figure 5.32, a lower dyC bound is 5. As a matter of
                               fact, a lower bound for the VC dimension of a two-class MLP, with hard limiters,
                               is given precisely by  the maximum number of linearly separable regions, formula
                               (5-52).
                                 Figure 5.33 shows the values of this lower bound for several values of d and h,
                               obtained with the PR Size program (see Appendix B).






















                               Figure 5.33. Lower bound on the VC dimension for a two-class MLP with hard-
                               limited activation functions (logarithmic vertical scale).


                                 Bounds on the training set size needed for MLP generalization depend  on the
                               VC  dimension.  A  lower  bound  on  the  number  of  training  samples  needed  for
                               generalization is due to Blumer et al. (1989):

                                                                         I
                                                    , d,  (1 - 2@(1  -a)+ a)) ,


                                 where Pe is the true error rate of the neural network, and  I -a is the confidence
                               level that the design set estimate of  the error rate will approximate with arbitrary
   202   203   204   205   206   207   208   209   210   211   212