Page 154 -
P. 154
8 The Importance of Ontological Structure: Why Validation by ‘Fit-to-Data’... 151
• Good models fit the data (G F).
• My model fits the data (F).
and concluding that
• My model is a good model (`G).
Oreskes et al. (1994) assert that (prejudices such as Ockham’s razor aside)
in closed systems, only good models fit the data (G F); in open systems, the
observed data could have been affected by external influences outside the system.
When fitting functions to data from complex open systems (such as social and
ecological systems), the ability to exclude or control for external influences is highly
constrained. A model of a subsystem that just fits to data will likely also be fitting
to external influences on that subsystem.
If a model somehow captures the effect of an external influence that it is not
supposed to model, we should be rather suspicious. Further, as Filatova et al. (2016)
point out, disturbances to a complex socioecological system need not only arise
from exogenous influences but can also grow from endogenous gradual change. If
there are multiple ‘attractors’ and the data have followed one path at a bifurcation
but a model follows another, the model will fail to validate. Over multiple runs of the
model, of course, it might take the same path as the data did half the time. Given the
choice between two models, one of which is simpler, and always follows the path
the data did (because it is high bias and doesn’t bifurcate), and another of which is
more complicated, and only follows the path the data did half the time, Ockham’s
razor and fit-to-data heuristics tell us to choose the former. However, it is arguably
the latter model that has more faithfully captured the underlying dynamics of the
system.
The probability of following one trajectory rather than another need not nec-
essarily be 0.5. It could be 1E–6, and it just so happened that this time, the real
world followed the one-in-a-million chance trajectory. The model that captures the
bifurcation may not be run enough times that the path the data took is observed. The
point remains that in complex systems, fit-to-data is not necessarily an indicator that
we have a ‘good’ model. If our model is ontology-free, then it is doubly awful, an
oversimplified bendy sheet that hardly reflects the system it is modelling: ‘It is a tale
told by an idiot, full of sound and fury, signifying nothing’. 4
To summarize, validation by fit-to-data is not necessarily (on its own) a helpful
measure in complex systems. No matter what the outcome, there exists an argument
both for and against the model (Table 8.1). Nevertheless, it is still a potentially useful
information about a model, and we show in the box various methods for computing
validation error on a set of data or otherwise comparing models’ expected prediction
ability. As is apparent from reading Brewer et al. (2016), there is controversy
in some of the modelling literatures about which measure of expected prediction
ability is ‘best’. This can lead to reviewers complaining that one measure should
4 Macbeth, Act V, Scene V.