Page 184 -
P. 184

172      5 Neural Networks

                                    Notice that this is a purely constructive argument that justifies why a three-layer
                                  network can achieve any arbitrarily complex mapping. It does not mean that MLPs,
                                  using any appropriate learning algorithm, will necessarily converge  to a solution
                                  built with the AND operation at the second layer and the OR operation at the third
                                  layer, although,  for simple problems they  sometimes do (see Exercise 5.7). As  a
                                  matter of fact, training an MLP2:2:1 with  logistic activation functions for the set
                                  shown in Figure 5-2313, the first layer weights shown in Table 5.4 were determined.
                                  It is a simple matter to confirm that the straight lines implemented by  these first
                                  layer hidden neurons do indeed correspond to the boundaries of the shaded area in
                                  Figure 5.23b, and that for this pattern set the constructive argument is verified.
                                    Although there are decision boundaries that cannot be exactly implemented with
                                  two-layer  networks,  it  can  be  proved  that  two-layer  networks  with  sigmoidal
                                  activation  functions  can  approximate,  with  arbitrary  closeness,  any  decision
                                  boundary  (see e.g. Bishop,  1995). Therefore, we will pay  more attention to two-
                                  layer networks,  in particular  in  what  concerns  the complexity  issue discussed in
                                  section 5.6.4.

                                  Table 5.4.  Weights obtained for a MLP2:2:1 and dataset of Figure 5.23b.

                                                                Bias               W 1            Wz
                                     -                 -        --   -   -        -
                                    Hidden neuron  1          -13.000          9.7278          9.3740
                                    Hidden neuron 2           -8.3262          11.688         10.9780




                                  5.5.1 The Back-Propagation Algorithm

                                  The  first  and  most  popular  weight  adjustment  algorithm  for  the  multi-layer
                                  perceptron was invented by  Rummelhart et al. (1986). We will proceed to explain
                                  its main steps for a network with two layers, denoting by  i, j  and k respectively the
                                  indices for inputs (x),  hidden neurons (y) and output neurons (2).
                                    Let  us  first rewrite formula (5-2a), concerning the error obtained at an  output
                                  neuron k, for any input pattern, in a simplified way:





                                  where zk  denotes the neuron output.
                                    As seen in (5-18), each neuron of  a multi-layer perceptron computes an output
                                  that is a function of the dot product of the weight vector and the input vector. We
                                  then have for hidden neurons and output neurons:
   179   180   181   182   183   184   185   186   187   188   189