Page 210 -
P. 210

198      5 Neural Networks





















                             Figure  5.35. The class of  functions F = {  (x,f(x) = ax + b); a, b~  32  )  pseudo-
                             shatters the set  {x,, x2]. The four functions shown achieve all possible "above=O/
                             belowml " combinations of  ( r,, r2 J .




                               Notice that the same set Xof Figure 5.35 is not pseudo-shattered by the class of
                             functions G = {  (x, g(x) = a); a€ '3 1.
                               The pseudo-shattering concept relates to the expressiveness or flexibility that a
                             class of functions must have in order to approximate a set of points. For instance,
                             in  the example  of  Figure 5.35 the class of  functions G is not flexible enough to
                             approximate  {(x1,r2), (x2,r2)), On  the other hand,  the class of  functions F  is not
                             flexible enough in other circumstances (see Exercise 5.14).
                               The maximum cardinality of a subset of X that is pseudo-shattered by F is called
                             the pseudo-dimension  of F, and denoted Pdim(F). For the example of Figure 5.35
                             Pdim(F)=2  and  Pdim(G)=l.  Therefore,  the  pseudo-dimension  of  a  class  of
                             functions reflects the expressiveness of the class.
                               Figure 5.35 shows that the "above/below" combinations of  {rl, r2) are actually
                             achieved with a margin of yaround the points ri. When it is possible to achieve this
                             separation we say that X is y-shattered by F. The corresponding pseudo-dimension
                             is now called pdimension or fat-shattering dimension and denoted fat(F,j).
                               A  theorem  due to  Simon (1997) allows the  computation of  the  fat-shattering
                             dimension for all functions mapping [0, 11 into [0, 11 and having a total variation of
                             V, as:





                             where the function trunc(x) returns the largest integer less than or equal to x and V
                             is the total function variation:
   205   206   207   208   209   210   211   212   213   214   215