Page 186 -
P. 186
174 5 Neural Networks
For a hidden neuron the error term 4 is more difficult to obtain since it depends
on the errors at the output neurons it is connected to, as exemplified in Figure 5.24.
For this purpose we express 4 as a summation of chained derivatives:
Note that the first term in the summation corresponds to the back-propagated
error from an output (&, and the second term reflects the influence of the
activation function of the hidden neurons, as well as the weights connecting the
hidden neuron to the output neurons. Assuming that all activation functions are
equal we can therefore write:
Notice how the error terms at the output neurons contribute to the error terms at
the hidden neurons. This back-propagation of the errors justifies the name of the
algorithm.
Using these errors and the gradient descent equations (5-7) we can now write
the formulas for updating the weights.
- Weight connecting output neuron k with hidden neuron j:
- Weight connecting hidden neuron j with input neuron i:
For more than two layers, the process of error back-propagation generalizes
easily using the back-propagation formula (5-23c).
The back-propagation algorithm uses formulas (5-24a) and (5-24b) with initial
random weights until the iterative gradient descent process reaches a minimum of
the energy function. The error hypersurface of a multi-layer perceptron depends on
several weight parameters, and is therefore expected to be quite complex and to
possibly have many local minima. Notice that such a simple problem as the one
presented in Figure 5.4 already exhibited local minima. Usually many trials have to
be performed, with different initial weights and learning factor 7, in order to reach
the global minimum. Also, for large learning factors, one may obtain divergent
behaviour or wild oscillations around the minimum, as previously mentioned in the
ECG filter example in section 5.1. As a remedy to this oscillating behaviour it is
normal to include a momentum term in the weight updating formulas, dependent
upon the weight increment in the previous iteration, as follows: