Page 209 -
P. 209

5.6 Performance of Neural Networks   197





















                                  Figure 5.34. Bounds for two-class MLPs with hard-limiters, computed for Pe=0.05
                                  and 0~0.01 (logarithmic vertical scale).



                                    The PR Size program provides, therefore, the appropriate guidance in the choice
                                  of h and n for classification problems with neural networks. For the two-class cork
                                  stoppers classification  problem  described  in  section 5.3,  an  MLP2:l  with  hard-
                                  limiter was used, which for a 95% confidence level of the respective error estimate,
                                  corresponds to nl=27 and n,=150.  The lower bound is satisfied, since we are using
                                  100 patterns. However, we may be somewhat far away from the optimal classifier,
                                  since  the  number  of  patterns  we  have  available  is  lower  than  the  estimated
                                  sufficient number for generalization, nu.
                                    When applying these bounds to MLPs one must take into account that these are
                                  initially  trained  with  small  weights,  therefore  introducing  a  bias  in  favour  of
                                  solutions  with  small weights.  This,  in  fact,  reduces  the  effective dvC, favouring
                                  smaller  training  sets.  On  the  other  hand,  for  MLPs  with  sigmoid  activation
                                  functions the  dvc  will  be  at  least  as  great  as  formula  (5-53) indicates,  since  a
                                  sigmoid  unit  can  approximate  a  hard-limiter  with  arbitrary  accuracy  using
                                  sufficiently large weights. Therefore, the lower bound formula (5-55) also applies
                                  for networks with sigmoid units. Appropriate formulas for the upper bounds, in this
                                  situation, are presented in (Anthony and Bartlett,  1999).
                                    Bounds on the necessary  and sufficient number of patterns, needed for learning
                                  a regression  function, have  also been  studied  by  several authors.  These  bounds
                                  depend on a generalized version of the Vapnik-Chervonenkis dimension.
                                    We now  present some important concepts on this matter by  first defining the
                                  pseudo-shattering of  a  set  of  points  X={xi, xi€ 31,  i=l,  ..., m) by  a  class  of
                                  functions F, mapping X into the real numbers set.
                                    We say  that Xis pseudo-shattered  by  F if  there are m real  numbers ri and 2"
                                  functions of F that achieve all possible "above/below" combinations with respect to
                                  the ri. This is illustrated in Figure 5.35 for a class of linear functions F = {  (x,flx) =
                                  ax + b); a, b~ 31 ). The points ri are said to witness thepseudo-shattering.
   204   205   206   207   208   209   210   211   212   213   214