Page 183 -
P. 183

5.5 Multi-Layer Perceptrons   17 1





                               5.5  Multi-Layer Perceptrons


                               In  the  previous  section  we  presented  the  feed-forward  multi-layer  perceptron,
                               whose diagram is shown in Figure 5.20.  This type of network is capable of more
                               complex  mappings  than  the  single-layer  perceptron  and,  for  differentiable
                               activation functions, there exists a powerful algorithm for finding a minimum error
                               solution based on the gradient descent concept, called error back-propagation.
                                 Let us first  see what kinds of mappings a multi-layer  perceptron  is capable of,
                               using a constructive argument  (Lippman,  1987) for MLPs with activation functions
                               of the threshold type, with outputs 0, 1.
                                 With a single-layer perceptron only one linear discriminant can be implemented,
                               as illustrated in Figure 5.23a for the MLP Sets dataset.
                                 Let us now consider the two-layer perceptron of Figure 5.20. Each neuron of the
                               hidden layer implements a linear discriminant. Assuming that the bias of the output
                               neuron,  with  h  hidden neurons,  has a value in 1-h, -h+l[,  then the output neuron
                               will produce the value  1 only when all hidden neurons are  1. This corresponds to
                               the  intersection  (AND  operation)  of  the  half-planes  produced  by  the  hidden
                               neurons in the  1-value side, as exemplified in Figure  5.23b. We can perform  the
                               AND operation in the second layer, upon the discriminants obtained from the first
                               layer, thereby building arbitrarily complex convex regions.
                                 With a three-layer  network  we can arrange for the first two layers to generate a
                               sufficiently fine  grid  of hypercubes by using  2d hidden units  in the  first layer for
                               each hypercube (a square in d=2 space needs 4 hidden units). Next, we can arrange
                               for the output neuron to perform the reunion (OR operation) of the hypercubes of
                               the  second layer neurons using a bias in 10,  I[  (e.g.  0.5). The output neuron will
                               "fire" if any of the second layer neurons "fire". In Figure 5.23~ we  apply this OR
                               operation  at  the  third  layer  to  merge  the  disjointed  clusters.  Hence,  three-layer
                               networks  can  generate  any  arbitrarily  complex  mapping  involving  concave  or
                               disjointed regions.

















                                Figure  5.23.  Three  sets  of  two  class  points  classifiable  by:  (a)  Single-layer
                                perceptron; (b) Two-layer perceptron; (c) Three-layer perceptron.
   178   179   180   181   182   183   184   185   186   187   188