Page 204 -
P. 204

192      5 Neural Networks


                             5.6.4  Network Complexity

                             In  order  to  appropriately  solve  the  bias-variance  trade-off  described  in  the
                             preceding section, choosing a sensible network complexity for a given dataset is an
                             important task. Note that the bias-variance trade-off affords the appropriate insight
                             into the dimensionality ratio issue: for adequate training of a complex network with
                             w  weights we  need  to set a lower bound on the needed number n of  patterns in
                             order  to  ensure  that  the  network  has  generalization  capability,  yielding
                             reproducible results in independent test sets. Note also that, except for single-layer
                             networks, the number of weights will often be much larger than the dimension d of
                             the pattern vectors.
                               In  statistical  classification  it  is  possible  to  derive  formulas  describing,  for
                             instance, the training set error behaviour of  Bayesian normal classifiers with the
                             nld ratio. The problem is more complicated with  neural networks, where no such
                             formulas are available. Instead, it is possible to derive bounds for the nlw ratio. As
                             the  modelling capability of  a  neural  network  is  a function  of  the  discriminants
                             implemented  by  the  neurons,  such  bounds  are  essentially  based  on  the
                             combinatorial study of feasible class labels by sets of linear discriminants.
                               For  this  purpose,  let  us  first  consider  the  number  of  linearly  separable
                             dichotomies of n points, where a dichotomy is any partition of  the n patterns into
                             two disjointed groups. We consider that the n patterns are regularly distributed in
                             9Id, i.e.,  no subset of (d+l) patterns is contained in a 9Id hyperpane. For the two-
                             dimensional space, this corresponds to saying that no group of 3 points is co-linear.
                               It  is  possible  to  prove  that  given  n  regularly  distributed  points  in  sd, the
                             number of dichotomies that are linearly separable is given by:







                               When n=d+ 1, both expressions give the same result. The demonstration of this
                             result, known as Cover theorem, can be found in (Cover, 1965).
                               For d=2, the application of (5-5 1) gives the following results:

                             - For n=3, all the 23=8 dichotomies of the three points are linearly separable;
                             - For n=4, only 14 of the 16 dichotomies are linearly separable;
                             - For n=5, only 22 of the 32 dichotomies are linearly separable.

                               Using polynomial decision functions is equivalent to using linear discriminants
                             in  a  higher  dimension  space,  as  seen  in  2.1.1.  Thus,  using  quadratic  decision
                             functions  in  d=2  is  equivalent  to  using  linear  decision  functions  in  d-1  =
                             (2+1)(2+2)/2 - 1 = 5 dimensions. This means that for n=5 patterns, all dichotomies
                             would  be  separable by  the  quadratic  decision functions,  since  they  are linearly
                             separable in the 5-dimension space (D(5,5) = 32).
   199   200   201   202   203   204   205   206   207   208   209