Page 225 -
P. 225
222 A. Evans et al.
It addition, the last 10 years or so has seen considerable theoretical advances in the
determination of the probabilities of causation (e.g. Granger 1980; Pearl and Verma
1991; Greenland and Pearl 2006). For now, however, the tracking of causality is
much easier if the models build in appropriate structures from the start. While they
are in their infancy, techniques like process calculi (Worboys 2005) and Petri nets
show the potential of this area.
The inability to track causality leads to the perennial problem of identifiability,
that is, that a single model outcome may have more than one history of model
parameters that leads to it. Identifiability is part of a larger set of issues with
confirming that the model in the computer accurately reflects the system in the
real world—the so-called equifinality issue. These are issues that play out strongly
during model construction from real data and when validating a model against real
data, and a review of techniques to examine these problems, including using model
variation to determine the suitability of variables and parameters, can be found in
Evans (2012). At the model stage we are interested in, however, we at least have the
advantage that there is only one potential model that may have created the output—
the one running. Nevertheless, the identifiability of the parameters in a running
model still makes it hard to definitively say when model behaviour is reasonable.
For those modelling for prediction, this is of little consequence—as long as the
model gives consistently good predictions it may as well be a black box. However,
if we wish to tease the model apart and look at how results have emerged, these
issues become more problematic.
The mechanisms for dealing with these problems are pragmatic:
1. Examine the stability of the calibration process and/or the state of internal
variables that weren’t inputs or outputs across multiple runs.
2. Validate internal variables that weren’t inputs or outputs used in any calibration
against real data.
3. Run the model in a predictive mode with as many different datasets as possible—
the more the system can replicate reality at output, the more likely it is to replicate
reality internally. If necessary engage in inverse modelling: initialize parameters
randomly and then adjust them over multiple runs until they match all known
outputs.
Of these, by far the easiest, but the least engaged with, is checking the stability of
the model in parameter space (see Evans 2012 for a review). Various AI techniques
have been applied to the problem of optimising parameters to fit model output
distributions to some predetermined pattern (such as a “real-world” distribution).
However, the stability of these parameterizations and the paths AIs take to generate
them are rarely used to examine the degree to which the model fluctuates between
different states, let alone to reflect on the nature of the system. The assumption of
identifiability is that the more parameterized a model, the more likely it is a set
of parameter values can be derived which fit the data but don’t represent the true
values. However, in practice the limits on the range of parameter values within any
given model allow us an alternative viewpoint: that the more parameterized rules in
a model, the more the system is constrained by the potential range of the elements