Page 223 -
P. 223
220 A. Evans et al.
3. Tracking the causal processes through the model.
It may seem obvious, and yet it is worth pointing out, that model outputs can only
causally relate to model inputs, not additional data in the real world. Plainly insights
into the system can come from comparison with external data that is correlated or
miscorrelated with model outputs, but this is not the same as understanding your
model and the way it currently represents the system. One would imagine that this
means that understanding of a model cannot be facilitated by comparing it with
other, external, data, and yet it can often be worth:
4. Comparing model results with real-world data, because the relationships between
real data and both model inputs and model outputs may be clearer than the
relationships between these two things within the model.
Let’s imagine, for example, a model that predicts the location of burglaries across
a day in a city region where police corruption is rife. The model inputs are known
offenders’ homes, potential target locations and attractiveness, the position of the
owners of these targets and the police, who prefer to serve the wealthy. We may
be able to recognise a pattern of burglaries that moves, over the course of the day,
from the suburbs to the city centre. Although we have built into our model the fact
that police respond faster to richer people, we may find, using (1), that our model
doesn’t show less burglaries in rich areas, because the rich areas are so spatially
distributed that the police response times are stretched between them. We can then
alter the weighting of the bias away from the wealthy (2) to see if it actually reduces
the burglary rate in the rich areas by placing police nearer these neighbourhoods
as an ancillary effect of responding to poor people more. We may be able to fully
understand this aspect of the model and how it arises (3), but still have a higher
than expected burglary rate in wealthy areas. Finally, it may turn out (4) that there
is a strong relationship between these burglaries and real data on petrol sales, for no
other reason than both are high at transition times in this social system, when the
police would be most stretched between regions—suggesting in turn that the change
in police locations over time is as important as their positions at any one time.
Let us look at each of these methodologies for developing understanding in turn.
Correlation Most social scientists will be familiar with linear regression as a
means for describing data or testing for a relationship between two variables; there is
a long scientific tradition of correlating data between models and external variables,
and this tradition is equally applicable to intra-model comparisons. Correlating
datasets is one of the areas where automation can be applied. As an exploratory
tool, regression modelling has its attractions, not least its simplicity in both concept
and execution. Simple regressions can be achieved in desktop applications like
Microsoft Excel, as well as all the major statistical packages (R, SAS, SPSS, etc.).
Standard methodologies are well known for cross-correlation of both continuous
normal data and time series. However even for simple analyses with a single
input and single output variable, linear regression is not always an appropriate
technique. For example, logistic regression models will be more appropriate for
binary response data, Poisson models will be superior when values in the dependent