Page 185 -

P. 185

5.5 Multi-Laver Perce~trons 173

In order to apply the gradient descent concept to the multi-layer perceptron we
first have to compute the derivatives of the error as functions of the weights. The
derivative of the error as a function of any weight can be written using the chain-
rule of partial derivatives, as follows:

Note that on the right side of these formulas the first term represents the
derivative depending on the activation function, whereas the second term depends
only on the dot product, w{x and wcy, respectively, for the hidden neurons and for
the output neurons. The derivatives with respect to wji and to wh are simply xi and
h,, respectively. Therefore, the first term has a decisive contribution to the error and
we will denote it as follows:

Let us now compute the error terms 4 and &. For the output neuron zk we just
have to apply (5-20b) to the error formula, yielding:

Figure 5.24. A hidden neuron yj receiving the back-propagated errors from two
output neurons.

180 181 182 183 184 185 186 187 188 189 190