Page 209 -

P. 209

5.6 Performance of Neural Networks 197

Figure 5.34. Bounds for two-class MLPs with hard-limiters, computed for Pe=0.05
and 0~0.01 (logarithmic vertical scale).

The PR Size program provides, therefore, the appropriate guidance in the choice
of h and n for classification problems with neural networks. For the two-class cork
stoppers classification problem described in section 5.3, an MLP2:l with hard-
limiter was used, which for a 95% confidence level of the respective error estimate,
corresponds to nl=27 and n,=150. The lower bound is satisfied, since we are using
100 patterns. However, we may be somewhat far away from the optimal classifier,
since the number of patterns we have available is lower than the estimated
sufficient number for generalization, nu.
When applying these bounds to MLPs one must take into account that these are
initially trained with small weights, therefore introducing a bias in favour of
solutions with small weights. This, in fact, reduces the effective dvC, favouring
smaller training sets. On the other hand, for MLPs with sigmoid activation
functions the dvc will be at least as great as formula (5-53) indicates, since a
sigmoid unit can approximate a hard-limiter with arbitrary accuracy using
sufficiently large weights. Therefore, the lower bound formula (5-55) also applies
for networks with sigmoid units. Appropriate formulas for the upper bounds, in this
situation, are presented in (Anthony and Bartlett, 1999).
Bounds on the necessary and sufficient number of patterns, needed for learning
a regression function, have also been studied by several authors. These bounds
depend on a generalized version of the Vapnik-Chervonenkis dimension.
We now present some important concepts on this matter by first defining the
pseudo-shattering of a set of points X={xi, xi€ 31, i=l, ..., m) by a class of
functions F, mapping X into the real numbers set.
We say that Xis pseudo-shattered by F if there are m real numbers ri and 2"
functions of F that achieve all possible "above/below" combinations with respect to
the ri. This is illustrated in Figure 5.35 for a class of linear functions F = { (x,flx) =
ax + b); a, b~ 31 ). The points ri are said to witness thepseudo-shattering.

204 205 206 207 208 209 210 211 212 213 214