Page 147 -

P. 147

144 G. Polhill

the input units to the difference between the actual and desired output of the network
is more diluted, it can also be shown that more efﬁcient network topologies (in terms
of number of weights) involving two hidden layers can achieve the same level of
accuracy that can be achieved with one hidden layer (Cheng and Titterington 1994;
Chester 1990).
The algorithms used to determine the weights such that the network as a whole
provides a good ﬁt-to-data are not particularly of interest here. This material is
covered in various introductory textbooks on neural networks (e.g. Bishop 1995;
Gurney 1997; Hertz et al. 1991). What is of interest is that, having seen the structure
of a neural network and what it does, it is immediately clear that there is nothing in
that structure that reﬂects the real world, except for the assignment of input nodes
and output nodes to speciﬁc variables in the data to be ﬁtted. The numbers of hidden
nodes and layers must capture any patterns in how the real-world mechanisms
interact, the choice of which essentially reﬂects how complex the modeller expects
the function to ﬁt the data to need to be.
Neural networks have the absolute minimum in the way of ontological structure
it is possible to have. Their ‘content’ comes from the data they are trained to ﬁt. We
thus next discuss the principles behind adjusting a model to ﬁt its data, checking a
model’s ﬁt to available evidence and how this is done in neural networks.

8.1.2 Calibration, Validation and Generalization in Neural
Networks

Calibration, validation and generalization are three steps in the development and
application of any model. We discuss them here in relation to neural networks, ﬁrst
with a view to clarifying what we mean by those terms and second to discussing
some of the ways in which generalization (the application of the model) can go
wrong even for a well-validated model.
Since various terms are used in the modelling literature for the three processes
intended here by the words ‘calibration’, ‘validation’ and ‘generalization’, it is
best to be clear what is meant. The process begins with a set of data, with some
explanatory (input) variables and response (output) variables, and a model with a
predeﬁned structure that has some parameters that can be adjusted. The data are split
into two not necessarily equal parts. The larger part is typically used for calibration:
adjusting the parameters so that the difference between the model’s prediction for
the input variables in the data and the corresponding output variables in the data (the
error) is minimized. In neural networks, this is referred to as training and entails
adjusting the values of all the weights.
There is a caveat to the use of the term ‘minimization’. For reasons such as
measurement error in the data, if a function is capable of providing an exact ﬁt
to the data, this is potentially undesirable and is seen as overﬁtting. So, when we
say we want to minimize the error, it is usually understood that we wish to do so
without overﬁtting.

142 143 144 145 146 147 148 149 150 151 152