Page 206 -
P. 206

194      5 Neural Networks


                               Let us  analyse in particular the case d=2, h=l. There are, of  course, 2 linearly
                             separable regions  by  one discriminant, that  a single perceptrod  implements. As
                             confirmation,  formula  (5-52)  gives  R(1,2)=2.  On  the  other  hand,  with  one
                             discriminant, the maximum number of pattern dichotomies one can have is given
                             by  the previous formula  (5-5  1) of D(n, 4, represented in Figure 5.3 1. Notice that
                             up to n=3, all 2" dichotomies are linearly separable. For increasing values of n, the
                             number of dichotomies that can be implemented decreases considerably compared
                             with the number of possible dichotomies.
                               Let us now increase the number of hidden neurons to h=2, for a d=2 problem.
                             We  then  have  a  maximum  of  R(2,2)=4  linearly  separable  regions  by  two
                             discriminants. With regularly distributed n=4 patterns, all the possible dichotomies
                             (class labellings) can be implemented with these two discriminants (Figure 5.32a).
                             The same happens for a set of  n=5  patterns forming a convex hull, as shown in
                             Figure 5.32b. However, for a set of n=6 patterns forming a convex hull, it is not
                             possible to obtain all dichotomies (Figure 5.32~).
















                             Figure 5.32.  Dichotomies for a single layer perceptron with h=2: (a) Implemented
                             with  three regions, n=4; (b) Implemented with  four regions, n=5; (c) Cannot be
                             implemented with four regions, n=6.



                               Therefore,  for  h=2  hidden  neurons  we  obtain  a  departure  of  the  number of
                             dichotomies that can be implemented from the number of possible dichotomies, for
                             n=6. In  general, this behaviour is obtained for any value of h and is related to the
                             important concept of  Vapnik-Chervonenkis dimension.
                               Let  G(n) be  the  maximum  number of  distinct dichotomies implemented by  a
                             neural  net. This  function, called growth function,  is  shown  with  a  solid line in
                             Figure 5.3 1 for the case MLP:2: 1 : 1. A set of n patterns is said to be shattered by
                             the  neural  net  if  G(n)  = 2".  The  Vapnik-Chervonenkis (VC) dimension, dvc, is
                             precisely the largest number of patterns that can be shattered by a neural network.
                             It corresponds, therefore, to the maximum number of training patterns that can be
                             learned without error for all possible class labels. In the case of Figure 5.31, dvc=
                             3. As  a matter of  fact, for  a single perceptron, the number of  linearly separable


                             4
                              A two-layer MLP with h=l is equivalent to a single perceptron.
   201   202   203   204   205   206   207   208   209   210   211