Page 183 -
P. 183
5.5 Multi-Layer Perceptrons 17 1
5.5 Multi-Layer Perceptrons
In the previous section we presented the feed-forward multi-layer perceptron,
whose diagram is shown in Figure 5.20. This type of network is capable of more
complex mappings than the single-layer perceptron and, for differentiable
activation functions, there exists a powerful algorithm for finding a minimum error
solution based on the gradient descent concept, called error back-propagation.
Let us first see what kinds of mappings a multi-layer perceptron is capable of,
using a constructive argument (Lippman, 1987) for MLPs with activation functions
of the threshold type, with outputs 0, 1.
With a single-layer perceptron only one linear discriminant can be implemented,
as illustrated in Figure 5.23a for the MLP Sets dataset.
Let us now consider the two-layer perceptron of Figure 5.20. Each neuron of the
hidden layer implements a linear discriminant. Assuming that the bias of the output
neuron, with h hidden neurons, has a value in 1-h, -h+l[, then the output neuron
will produce the value 1 only when all hidden neurons are 1. This corresponds to
the intersection (AND operation) of the half-planes produced by the hidden
neurons in the 1-value side, as exemplified in Figure 5.23b. We can perform the
AND operation in the second layer, upon the discriminants obtained from the first
layer, thereby building arbitrarily complex convex regions.
With a three-layer network we can arrange for the first two layers to generate a
sufficiently fine grid of hypercubes by using 2d hidden units in the first layer for
each hypercube (a square in d=2 space needs 4 hidden units). Next, we can arrange
for the output neuron to perform the reunion (OR operation) of the hypercubes of
the second layer neurons using a bias in 10, I[ (e.g. 0.5). The output neuron will
"fire" if any of the second layer neurons "fire". In Figure 5.23~ we apply this OR
operation at the third layer to merge the disjointed clusters. Hence, three-layer
networks can generate any arbitrarily complex mapping involving concave or
disjointed regions.
Figure 5.23. Three sets of two class points classifiable by: (a) Single-layer
perceptron; (b) Two-layer perceptron; (c) Three-layer perceptron.