Page 51 - Rapid Learning in Robotics
P. 51
3.6 Selecting the Right Network Size 37
tween neighboring neurons in the map. This can be interpreted as a
regularization for the SOM and the “Neural-Gas” network.
3.6 Selecting the Right Network Size
F
Beside the accuracy criterion (LO , Eq. 3.1) the simplicity of the network
is desirable, similar to the idea of Occam's Razor. The formal way is to
augment the cost function by a complexity cost term, which is often written
as a function of the number of non-constant model parameters (additive
or multiplicative penalty, e.g. the Generalized Cross-Validation criterion
GCV; Craven and Wahba 1979).
There are several techniques to select the right network size and struc-
ture:
Trial-and-Error is probably the most prominent method in practice. A
particular network structure is constructed and evaluated, which in-
cludes training and testing. The achieved lack-of-fit (LOF ) is esti-
mated and minimized.
Genetic Algorithms can automize this optimization method, in case of a
suitable encoding of the construction parameter, the genome can be
defined. Initially, a set of individuals (network genomes), the pop-
ulation is constructed by hand. During each epoch, the individuals
of this generation are evaluated (training and testing). Their fitnesses
(negative cost function) determine the probability of various ways of
replication, including mutations (stochastic genome modifications)
and cross-over (sexual replication with stochastic genome exchange).
The applicability and success of this method depends strongly on
the complexity of the problem, the effective representation, and the
computation time required to simulate evolution. The computation
time is governed by the product of the (non-parallelized) population
size, the fitness evaluation time, and the number of simulated gen-
erations. For an introduction see Goldberg (1989) and, e.g. Miller,
Todd, and Hegde (1989) for optimizing the coding structure and for
weights determination Montana and Davis (1989).
Pruning and Weight Decay: By including a suitable non-linear complex-
ity penalty term to the iterative learning cost function, a fraction of