Page 189 -
P. 189
5.5 Multi-Layer Perceptrons 177
decreasing learning rate, finishing with a small value after a large number of
epochs.
Figure 5.25. Learning curve for the dataset represented in Figure 5.23~. More than
20000 epochs are needed for a definite convergence path.
The momentum factor is chosen in the range [0, 1[, and it is advisable to
decrease it during training.
Note that when a class is represented by very few patterns, it may take a long
time to train before the optimal solution is reached. This is a consequence of the
fact that the error energy will then suffer a small influence from the errors relative
to the poorly represented class. This effect is exemplified by the set 3 training of
the MLP Sets data (Figure 5.23c), as illustrated in Figure 5.25. A convergence to
the global minimum, using a MLP2:4:2:1, was observed in one trial only after
more that 20000 iterations. For some initial values of the learning parameters no
convergence was observed. In such difficult cases it is advisable to use small
training factors. In the case of Figure 5.25 a value of 0.02 was used.
Local minima
In order to avoid local minima of the energy function, one can run several training
experiments with different specifications for the initial weights and the learning
and momentum factors. The number of experiments, r, with different random
starting weights, needed to ensure that a network will reach a solution within a
desirable lower percentile of all possible experiments is given by (Iyer and
Rhinehart, 1999):
where p is the percentile and a is the confidence level.