Page 184 -

P. 184

172 5 Neural Networks

Notice that this is a purely constructive argument that justifies why a three-layer
network can achieve any arbitrarily complex mapping. It does not mean that MLPs,
using any appropriate learning algorithm, will necessarily converge to a solution
built with the AND operation at the second layer and the OR operation at the third
layer, although, for simple problems they sometimes do (see Exercise 5.7). As a
matter of fact, training an MLP2:2:1 with logistic activation functions for the set
shown in Figure 5-2313, the first layer weights shown in Table 5.4 were determined.
It is a simple matter to confirm that the straight lines implemented by these first
layer hidden neurons do indeed correspond to the boundaries of the shaded area in
Figure 5.23b, and that for this pattern set the constructive argument is verified.
Although there are decision boundaries that cannot be exactly implemented with
two-layer networks, it can be proved that two-layer networks with sigmoidal
activation functions can approximate, with arbitrary closeness, any decision
boundary (see e.g. Bishop, 1995). Therefore, we will pay more attention to two-
layer networks, in particular in what concerns the complexity issue discussed in
section 5.6.4.

Table 5.4. Weights obtained for a MLP2:2:1 and dataset of Figure 5.23b.

Bias W 1 Wz
- - -- - - -
Hidden neuron 1 -13.000 9.7278 9.3740
Hidden neuron 2 -8.3262 11.688 10.9780

5.5.1 The Back-Propagation Algorithm

The first and most popular weight adjustment algorithm for the multi-layer
perceptron was invented by Rummelhart et al. (1986). We will proceed to explain
its main steps for a network with two layers, denoting by i, j and k respectively the
indices for inputs (x), hidden neurons (y) and output neurons (2).
Let us first rewrite formula (5-2a), concerning the error obtained at an output
neuron k, for any input pattern, in a simplified way:

where zk denotes the neuron output.
As seen in (5-18), each neuron of a multi-layer perceptron computes an output
that is a function of the dot product of the weight vector and the input vector. We
then have for hidden neurons and output neurons:

179 180 181 182 183 184 185 186 187 188 189