Page 49 - Rapid Learning in Robotics
P. 49
3.5 Strategies to Avoid Over-Fitting 35
could be extracted from a sequence of three-word sentences (Koho-
nen 1990; Ritter and Kohonen 1989). The topology preserving prop-
erties enables cooperative learning in order to increase speed and ro-
bustness of learning, studied e.g. in Walter, Martinetz, and Schulten
(1991) and compared to the so-called Neural-Gas Network in Walter
(1991) and Walter and Schulten (1993).
The Neural-Gas Network shows in contrast to the SOM not a fixed
grid topology but a “gas-like”, dynamic definition of the neighbor-
hood function, which is determined by (dynamic) ranking of close-
ness in the input space (Martinetz and Schulten 1991). This results in
advantages for applications with inhomogeneous or unknown topol-
ogy (e.g. prediction of chaotic time series like the Mackey-Glass
series in Walter (1991) and later also published in Martinetz et al.
(1993)).
The choice of the type of approximation function introduces bias, and
restricts the variance of the of the possible solutions. This is a fundamental
relation called the bias–variance problem (Geman et al. 1992). As indicated
before, this bias and the corresponding variance reduction can be good or
bad, depending on the suitability of the choice. The next section discusses
the problem over-using the variance of a chosen approximation ansatz,
especially in the presence of noise.
3.5 Strategies to Avoid Over-Fitting
Over-fitting can occur, when the function f gets approximated in the do-
.If
main D, using only a too limited number of training data points D train
the ratio of free parameter versus training points is too high, the approxi-
mation fits to the noise, as illustrated by Fig. 3.4. This results in a reduced
generalization ability. Beside the proper selection of the appropriate net-
work structure, several strategies can help to avoid the over-fitting effect:
Early stopping: During incremental learning the approximation error is
systematically decreased, but at some point the expected error or
F
lack-of-fit LO F D starts to increase again. The idea of early stop-
ping is to estimate the LOF on a separate test data set D test and de-
termine the optimal time to stop learning.