Page 185 -
P. 185

5.5 Multi-Laver Perce~trons   173

                                In order to apply the gradient descent concept to the multi-layer perceptron we
                              first have to compute the derivatives of  the error as functions of  the weights. The
                              derivative of the error as a function of any weight can be written using the chain-
                              rule of partial derivatives, as follows:










                                Note  that  on  the  right  side  of  these  formulas  the  first  term  represents  the
                              derivative depending on the activation function, whereas the second term depends
                              only on the dot product, w{x  and wcy, respectively, for the hidden neurons and for
                              the output neurons. The derivatives with respect to wji and to wh are simply xi and
                              h,,  respectively. Therefore, the first term has a decisive contribution to the error and
                              we  will denote it as follows:










                                Let us now compute the error terms 4 and &. For the output neuron zk  we just
                              have to apply (5-20b) to the error formula, yielding:























                               Figure 5.24.  A hidden  neuron yj receiving the back-propagated  errors from  two
                               output neurons.
   180   181   182   183   184   185   186   187   188   189   190