Page 213 -
P. 213
5.6 Performance of Neural Networks 201
have a guaranteed risk by summing up an empirical risk and a structural risk as
depicted in Figure 5.36.
Figure 5.36. Guaranteed risk of a neural network as a sum of empirical and
structural risks.
For generalization purposes, we are interested in applying a principle of
structural risk minimization, SRM. An experimental way of minimizing the
structural risk consists of defining a sequence of networks with increasing dye, by
addition of more hidden neurons. For each network the empirical risk is
minimized, and one progresses to a more complex machine until reaching a
minimum of the guaranteed risk.
5.7 Approximation Methods in NN Training
In section 5.5 we saw how to train MLPs using the back-propagation algorithm,
based on a gradient descent technique. Some pitfalls of this technique were
explained in that section and in more detail in section 5.6.2, when we analysed the
influence of the Hessian matrix on the learning process. There are several
alternative algorithms for training MLPs that either attempt to improve the gradient
descent technique used in the back-propagation algorithm, or use a completely
different approach, not based on the gradient descent method. This last class of
algorithms uses ideas and techniques imported from the respectable body of
methodologies of multivariate function optimisation. The reader can find a detailed
explanation of these techniques in (Fletcher, 1987) and their application to MLPs
in (Bishop, 1995). In this section we will present only two of these methods,
which are very fast in terms of convergence and do not require the specification of
parameters (learning and momentum factor) as in the back-propagation algorithm.