Page 210 -
P. 210
198 5 Neural Networks
Figure 5.35. The class of functions F = { (x,f(x) = ax + b); a, b~ 32 ) pseudo-
shatters the set {x,, x2]. The four functions shown achieve all possible "above=O/
belowml " combinations of ( r,, r2 J .
Notice that the same set Xof Figure 5.35 is not pseudo-shattered by the class of
functions G = { (x, g(x) = a); a€ '3 1.
The pseudo-shattering concept relates to the expressiveness or flexibility that a
class of functions must have in order to approximate a set of points. For instance,
in the example of Figure 5.35 the class of functions G is not flexible enough to
approximate {(x1,r2), (x2,r2)), On the other hand, the class of functions F is not
flexible enough in other circumstances (see Exercise 5.14).
The maximum cardinality of a subset of X that is pseudo-shattered by F is called
the pseudo-dimension of F, and denoted Pdim(F). For the example of Figure 5.35
Pdim(F)=2 and Pdim(G)=l. Therefore, the pseudo-dimension of a class of
functions reflects the expressiveness of the class.
Figure 5.35 shows that the "above/below" combinations of {rl, r2) are actually
achieved with a margin of yaround the points ri. When it is possible to achieve this
separation we say that X is y-shattered by F. The corresponding pseudo-dimension
is now called pdimension or fat-shattering dimension and denoted fat(F,j).
A theorem due to Simon (1997) allows the computation of the fat-shattering
dimension for all functions mapping [0, 11 into [0, 11 and having a total variation of
V, as:
where the function trunc(x) returns the largest integer less than or equal to x and V
is the total function variation: