Page 149 -
P. 149

146                                                         G. Polhill

              A final problem is a consequence of encoding variables that have nominal values.
            Assuming an appropriate encoding of nominals in the input variables of the model,
            the calibration and validation data may only have provided a subset of the nominals
            the variable can have. The generalization may, however, be for a value of the
            nominal that was not in the data used to construct the model. For neural networks,
            this is less of an issue than with symbolic AI machine learning algorithms: one of
            the supposed advantages of neural networks is that they are less ‘brittle’ with respect
            to the language of representation of the states of the world, because they do not rely
            on the language having a specific vocabulary to represent every possible state that
            might ever be of interest (Aha 1992; Hanson and Burr 1990; Holland 1986).
              In essence, calibration is the process of finding the parameters of a neural
            network (or more generally, any model) that best fit your data. Validation is the
            process of establishing the confidence you can expect to have in the predictions of
            the model based on the data you have got. Generalization is the capability of a model
            to make predictions in new situations. There are various reasons why that capability
            may be questioned. Apart from the relevance of the data used for calibration and
            validation in the new context, the reasons relate to how the modeller chose to encode,
            or represent, the data.




            8.1.3 Bias vs. Variance

            The representation of the data is not the only choice the modeller makes. This
            section covers the dilemma a modeller faces when choosing the structure of the
            model. In the case of neural networks, that structure is the number of layers and
            hidden units, which collectively determine the number of weights or parameters the
            model has. The fewer the number of parameters, the easier the model is to calibrate,
            but there is a risk of oversimplification. Since it is so easy to add more parameters to
            a neural network, there is a temptation to add more parameters. We introduce some
            rather advanced mathematics (Vapnik-Chervonenkis theory) to argue that in terms
            of demand for data, adding more parameters can be exponentially costly.
              Not all approaches using mathematical functions are ontology-free in the way
            neural networks are. If we are modelling oscillatory systems, for example, we
            might start with trigonometric functions. In general, the set of functions we are
            willing to consider for modelling a system constitutes our ‘bias’ – the smaller the
            set of functions, the greater the bias. Even neural networks have a ‘bias’ (not to be
            confused with the ‘bias’ node in the network itself), which is inversely related to the
            number of parameters (weights) in the network. In the ideal world, we would have a
            very high bias that constrained the set of functions we would consider so much that
            calibration, the search for ‘the’ function we are going to accept as modelling the
            target system, is trivial. The price to pay for this bias is that the data may not fit very
            well to the set of functions we are willing to consider; if we were only willing to
            expand that set of functions more, we would be able to achieve a much better fit to
            the data. The opposite of this meaning of ‘bias’ is ‘variance’; in neural networks, this
   144   145   146   147   148   149   150   151   152   153   154