Page 49 - Rapid Learning in Robotics
P. 49

3.5 Strategies to Avoid Over-Fitting                                                    35


                       could be extracted from a sequence of three-word sentences (Koho-
                       nen 1990; Ritter and Kohonen 1989). The topology preserving prop-
                       erties enables cooperative learning in order to increase speed and ro-
                       bustness of learning, studied e.g. in Walter, Martinetz, and Schulten
                       (1991) and compared to the so-called Neural-Gas Network in Walter
                       (1991) and Walter and Schulten (1993).

                       The Neural-Gas Network shows in contrast to the SOM not a fixed
                       grid topology but a “gas-like”, dynamic definition of the neighbor-
                       hood function, which is determined by (dynamic) ranking of close-
                       ness in the input space (Martinetz and Schulten 1991). This results in
                       advantages for applications with inhomogeneous or unknown topol-
                       ogy (e.g. prediction of chaotic time series like the Mackey-Glass
                       series in Walter (1991) and later also published in Martinetz et al.
                       (1993)).

                     The choice of the type of approximation function introduces bias, and
                 restricts the variance of the of the possible solutions. This is a fundamental
                 relation called the bias–variance problem (Geman et al. 1992). As indicated
                 before, this bias and the corresponding variance reduction can be good or
                 bad, depending on the suitability of the choice. The next section discusses
                 the problem over-using the variance of a chosen approximation ansatz,
                 especially in the presence of noise.



                 3.5 Strategies to Avoid Over-Fitting


                 Over-fitting can occur, when the function f gets approximated in the do-
                                                                                            .If
                 main D, using only a too limited number of training data points D train
                 the ratio of free parameter versus training points is too high, the approxi-
                 mation fits to the noise, as illustrated by Fig. 3.4. This results in a reduced
                 generalization ability. Beside the proper selection of the appropriate net-
                 work structure, several strategies can help to avoid the over-fitting effect:

                 Early stopping: During incremental learning the approximation error is
                       systematically decreased, but at some point the expected error or
                                  F
                       lack-of-fit LO    F  D  starts to increase again. The idea of early stop-
                       ping is to estimate the LOF   on a separate test data set D test  and de-
                       termine the optimal time to stop learning.
   44   45   46   47   48   49   50   51   52   53   54