Page 149 -
P. 149
146 G. Polhill
A final problem is a consequence of encoding variables that have nominal values.
Assuming an appropriate encoding of nominals in the input variables of the model,
the calibration and validation data may only have provided a subset of the nominals
the variable can have. The generalization may, however, be for a value of the
nominal that was not in the data used to construct the model. For neural networks,
this is less of an issue than with symbolic AI machine learning algorithms: one of
the supposed advantages of neural networks is that they are less ‘brittle’ with respect
to the language of representation of the states of the world, because they do not rely
on the language having a specific vocabulary to represent every possible state that
might ever be of interest (Aha 1992; Hanson and Burr 1990; Holland 1986).
In essence, calibration is the process of finding the parameters of a neural
network (or more generally, any model) that best fit your data. Validation is the
process of establishing the confidence you can expect to have in the predictions of
the model based on the data you have got. Generalization is the capability of a model
to make predictions in new situations. There are various reasons why that capability
may be questioned. Apart from the relevance of the data used for calibration and
validation in the new context, the reasons relate to how the modeller chose to encode,
or represent, the data.
8.1.3 Bias vs. Variance
The representation of the data is not the only choice the modeller makes. This
section covers the dilemma a modeller faces when choosing the structure of the
model. In the case of neural networks, that structure is the number of layers and
hidden units, which collectively determine the number of weights or parameters the
model has. The fewer the number of parameters, the easier the model is to calibrate,
but there is a risk of oversimplification. Since it is so easy to add more parameters to
a neural network, there is a temptation to add more parameters. We introduce some
rather advanced mathematics (Vapnik-Chervonenkis theory) to argue that in terms
of demand for data, adding more parameters can be exponentially costly.
Not all approaches using mathematical functions are ontology-free in the way
neural networks are. If we are modelling oscillatory systems, for example, we
might start with trigonometric functions. In general, the set of functions we are
willing to consider for modelling a system constitutes our ‘bias’ – the smaller the
set of functions, the greater the bias. Even neural networks have a ‘bias’ (not to be
confused with the ‘bias’ node in the network itself), which is inversely related to the
number of parameters (weights) in the network. In the ideal world, we would have a
very high bias that constrained the set of functions we would consider so much that
calibration, the search for ‘the’ function we are going to accept as modelling the
target system, is trivial. The price to pay for this bias is that the data may not fit very
well to the set of functions we are willing to consider; if we were only willing to
expand that set of functions more, we would be able to achieve a much better fit to
the data. The opposite of this meaning of ‘bias’ is ‘variance’; in neural networks, this