Page 197 -
P. 197
194 N. David et al.
output behaviour? Axtell et al. (1996) defined three kinds of equivalence or levels
of similarity between model outputs: numerical identity, relational alignment and
distributional equivalence. The first, numerical identity, implies exact numerical
output and is difficult to demonstrate for stochastic models in general and social
complexity models in particular. Relational alignment between outputs exists if they
show qualitatively similar dependencies with input data, which is frequently the only
way to compare a model with another which is inaccessible (e.g. implementation
has not been made available by the original author), or with a non-controllable
“real” social system. Lastly, distributional equivalence between implementations
is achieved when the distributions of results cannot be statistically distinguished.
What this shows is that at conventional confidence probabilities the statistics
from different implementations may come from the same distribution, but it does
not prove that this is actually the case. In other words, it does not prove that
two implementations are algorithmically equivalent. Nonetheless, demonstrating
equivalence for a larger number of parametrisations increases the confidence that
the implementations are in fact globally equivalent (Edmonds and Hales 2003).
Since numerical identity is difficult to attain, and is not critical for showing that
two such models have the same dynamic behaviour, distributional equivalence is
more often than not the appropriate standard when comparing two implementations
of a stochastic social complexity model. When aiming for distributional equiva-
lence, a set of statistical summaries representative of each output are selected. It is
these summaries, and not the complete outputs, that will be compared in order to
assess the similarity between the original computational model and the replicated
one. As models may produce large amounts of data, the summary measures should
be chosen as to be relevant to the actual modelling goal. The summaries of all model
outputs constitute the set of focal measures (FMs) of a model (Wilensky and Rand
2007), or more specifically, of a model parametrisation (since different FMs may be
selected for distinct parametrisations). However, this process is empirically driven
and model-dependent, or even parameter-dependent. Furthermore, it is sometimes
unclear as to what output features best describe model behaviour. A possible
solution, presented by Arai and Watanabe (2008) in the context of comparing
models with different elements, is the automatic extraction of FMs from time-
series simulation output using the discrete Fourier transform. Fachada et al. (2017)
proposed a similarly automated method, using principal component analysis to
convert simulation output into a set of linearly uncorrelated statistical measures,
analysable in a consistent, model-independent fashion. The proposed method was
broader in scope—with support for multiple outputs and different types of data—
and is available in the form of a software package for the R platform (Fachada et al.
2016; R Core Team 2017).
Once the FMs are extracted from simulation output, there are three major
statistical approaches used to compare them: (1) statistical hypothesis tests; (2)
confidence intervals; and (3) graphical methods (Balci and Sargent 1984). Statistical
hypothesis tests are often used for comparing two or more computational models
(Axtell et al. 1996; Wilensky and Rand 2007; Edmonds and Hales 2003; Miodownik
et al. 2010; Radax and Rengs 2009; Fachada et al. 2017). More specifically,