Page 147 -
P. 147

144                                                         G. Polhill

            the input units to the difference between the actual and desired output of the network
            is more diluted, it can also be shown that more efficient network topologies (in terms
            of number of weights) involving two hidden layers can achieve the same level of
            accuracy that can be achieved with one hidden layer (Cheng and Titterington 1994;
            Chester 1990).
              The algorithms used to determine the weights such that the network as a whole
            provides a good fit-to-data are not particularly of interest here. This material is
            covered in various introductory textbooks on neural networks (e.g. Bishop 1995;
            Gurney 1997; Hertz et al. 1991). What is of interest is that, having seen the structure
            of a neural network and what it does, it is immediately clear that there is nothing in
            that structure that reflects the real world, except for the assignment of input nodes
            and output nodes to specific variables in the data to be fitted. The numbers of hidden
            nodes and layers must capture any patterns in how the real-world mechanisms
            interact, the choice of which essentially reflects how complex the modeller expects
            the function to fit the data to need to be.
              Neural networks have the absolute minimum in the way of ontological structure
            it is possible to have. Their ‘content’ comes from the data they are trained to fit. We
            thus next discuss the principles behind adjusting a model to fit its data, checking a
            model’s fit to available evidence and how this is done in neural networks.



            8.1.2 Calibration, Validation and Generalization in Neural
                   Networks


            Calibration, validation and generalization are three steps in the development and
            application of any model. We discuss them here in relation to neural networks, first
            with a view to clarifying what we mean by those terms and second to discussing
            some of the ways in which generalization (the application of the model) can go
            wrong even for a well-validated model.
              Since various terms are used in the modelling literature for the three processes
            intended here by the words ‘calibration’, ‘validation’ and ‘generalization’, it is
            best to be clear what is meant. The process begins with a set of data, with some
            explanatory (input) variables and response (output) variables, and a model with a
            predefined structure that has some parameters that can be adjusted. The data are split
            into two not necessarily equal parts. The larger part is typically used for calibration:
            adjusting the parameters so that the difference between the model’s prediction for
            the input variables in the data and the corresponding output variables in the data (the
            error) is minimized. In neural networks, this is referred to as training and entails
            adjusting the values of all the weights.
              There is a caveat to the use of the term ‘minimization’. For reasons such as
            measurement error in the data, if a function is capable of providing an exact fit
            to the data, this is potentially undesirable and is seen as overfitting. So, when we
            say we want to minimize the error, it is usually understood that we wish to do so
            without overfitting.
   142   143   144   145   146   147   148   149   150   151   152