Page 51 - Rapid Learning in Robotics
P. 51

3.6 Selecting the Right Network Size                                                     37


                       tween neighboring neurons in the map. This can be interpreted as a
                       regularization for the SOM and the “Neural-Gas” network.



                 3.6 Selecting the Right Network Size


                                                  F
                 Beside the accuracy criterion (LO     , Eq. 3.1) the simplicity of the network
                 is desirable, similar to the idea of Occam's Razor. The formal way is to
                 augment the cost function by a complexity cost term, which is often written
                 as a function of the number of non-constant model parameters (additive
                 or multiplicative penalty, e.g. the Generalized Cross-Validation criterion
                 GCV; Craven and Wahba 1979).
                     There are several techniques to select the right network size and struc-
                 ture:

                 Trial-and-Error is probably the most prominent method in practice. A
                       particular network structure is constructed and evaluated, which in-
                       cludes training and testing. The achieved lack-of-fit (LOF       ) is esti-
                       mated and minimized.


                 Genetic Algorithms can automize this optimization method, in case of a
                       suitable encoding of the construction parameter, the genome can be
                       defined. Initially, a set of individuals (network genomes), the pop-
                       ulation is constructed by hand. During each epoch, the individuals
                       of this generation are evaluated (training and testing). Their fitnesses
                       (negative cost function) determine the probability of various ways of
                       replication, including mutations (stochastic genome modifications)
                       and cross-over (sexual replication with stochastic genome exchange).
                       The applicability and success of this method depends strongly on
                       the complexity of the problem, the effective representation, and the
                       computation time required to simulate evolution. The computation
                       time is governed by the product of the (non-parallelized) population
                       size, the fitness evaluation time, and the number of simulated gen-
                       erations. For an introduction see Goldberg (1989) and, e.g. Miller,
                       Todd, and Hegde (1989) for optimizing the coding structure and for
                       weights determination Montana and Davis (1989).

                 Pruning and Weight Decay: By including a suitable non-linear complex-
                       ity penalty term to the iterative learning cost function, a fraction of
   46   47   48   49   50   51   52   53   54   55   56