Page 206 -
P. 206
194 5 Neural Networks
Let us analyse in particular the case d=2, h=l. There are, of course, 2 linearly
separable regions by one discriminant, that a single perceptrod implements. As
confirmation, formula (5-52) gives R(1,2)=2. On the other hand, with one
discriminant, the maximum number of pattern dichotomies one can have is given
by the previous formula (5-5 1) of D(n, 4, represented in Figure 5.3 1. Notice that
up to n=3, all 2" dichotomies are linearly separable. For increasing values of n, the
number of dichotomies that can be implemented decreases considerably compared
with the number of possible dichotomies.
Let us now increase the number of hidden neurons to h=2, for a d=2 problem.
We then have a maximum of R(2,2)=4 linearly separable regions by two
discriminants. With regularly distributed n=4 patterns, all the possible dichotomies
(class labellings) can be implemented with these two discriminants (Figure 5.32a).
The same happens for a set of n=5 patterns forming a convex hull, as shown in
Figure 5.32b. However, for a set of n=6 patterns forming a convex hull, it is not
possible to obtain all dichotomies (Figure 5.32~).
Figure 5.32. Dichotomies for a single layer perceptron with h=2: (a) Implemented
with three regions, n=4; (b) Implemented with four regions, n=5; (c) Cannot be
implemented with four regions, n=6.
Therefore, for h=2 hidden neurons we obtain a departure of the number of
dichotomies that can be implemented from the number of possible dichotomies, for
n=6. In general, this behaviour is obtained for any value of h and is related to the
important concept of Vapnik-Chervonenkis dimension.
Let G(n) be the maximum number of distinct dichotomies implemented by a
neural net. This function, called growth function, is shown with a solid line in
Figure 5.3 1 for the case MLP:2: 1 : 1. A set of n patterns is said to be shattered by
the neural net if G(n) = 2". The Vapnik-Chervonenkis (VC) dimension, dvc, is
precisely the largest number of patterns that can be shattered by a neural network.
It corresponds, therefore, to the maximum number of training patterns that can be
learned without error for all possible class labels. In the case of Figure 5.31, dvc=
3. As a matter of fact, for a single perceptron, the number of linearly separable
4
A two-layer MLP with h=l is equivalent to a single perceptron.