Page 147 -
P. 147
144 G. Polhill
the input units to the difference between the actual and desired output of the network
is more diluted, it can also be shown that more efficient network topologies (in terms
of number of weights) involving two hidden layers can achieve the same level of
accuracy that can be achieved with one hidden layer (Cheng and Titterington 1994;
Chester 1990).
The algorithms used to determine the weights such that the network as a whole
provides a good fit-to-data are not particularly of interest here. This material is
covered in various introductory textbooks on neural networks (e.g. Bishop 1995;
Gurney 1997; Hertz et al. 1991). What is of interest is that, having seen the structure
of a neural network and what it does, it is immediately clear that there is nothing in
that structure that reflects the real world, except for the assignment of input nodes
and output nodes to specific variables in the data to be fitted. The numbers of hidden
nodes and layers must capture any patterns in how the real-world mechanisms
interact, the choice of which essentially reflects how complex the modeller expects
the function to fit the data to need to be.
Neural networks have the absolute minimum in the way of ontological structure
it is possible to have. Their ‘content’ comes from the data they are trained to fit. We
thus next discuss the principles behind adjusting a model to fit its data, checking a
model’s fit to available evidence and how this is done in neural networks.
8.1.2 Calibration, Validation and Generalization in Neural
Networks
Calibration, validation and generalization are three steps in the development and
application of any model. We discuss them here in relation to neural networks, first
with a view to clarifying what we mean by those terms and second to discussing
some of the ways in which generalization (the application of the model) can go
wrong even for a well-validated model.
Since various terms are used in the modelling literature for the three processes
intended here by the words ‘calibration’, ‘validation’ and ‘generalization’, it is
best to be clear what is meant. The process begins with a set of data, with some
explanatory (input) variables and response (output) variables, and a model with a
predefined structure that has some parameters that can be adjusted. The data are split
into two not necessarily equal parts. The larger part is typically used for calibration:
adjusting the parameters so that the difference between the model’s prediction for
the input variables in the data and the corresponding output variables in the data (the
error) is minimized. In neural networks, this is referred to as training and entails
adjusting the values of all the weights.
There is a caveat to the use of the term ‘minimization’. For reasons such as
measurement error in the data, if a function is capable of providing an exact fit
to the data, this is potentially undesirable and is seen as overfitting. So, when we
say we want to minimize the error, it is usually understood that we wish to do so
without overfitting.