Page 197 -
P. 197

194                                                     N. David et al.

            output behaviour? Axtell et al. (1996) defined three kinds of equivalence or levels
            of similarity between model outputs: numerical identity, relational alignment and
            distributional equivalence. The first, numerical identity, implies exact numerical
            output and is difficult to demonstrate for stochastic models in general and social
            complexity models in particular. Relational alignment between outputs exists if they
            show qualitatively similar dependencies with input data, which is frequently the only
            way to compare a model with another which is inaccessible (e.g. implementation
            has not been made available by the original author), or with a non-controllable
            “real” social system. Lastly, distributional equivalence between implementations
            is achieved when the distributions of results cannot be statistically distinguished.
            What this shows is that at conventional confidence probabilities the statistics
            from different implementations may come from the same distribution, but it does
            not prove that this is actually the case. In other words, it does not prove that
            two implementations are algorithmically equivalent. Nonetheless, demonstrating
            equivalence for a larger number of parametrisations increases the confidence that
            the implementations are in fact globally equivalent (Edmonds and Hales 2003).
              Since numerical identity is difficult to attain, and is not critical for showing that
            two such models have the same dynamic behaviour, distributional equivalence is
            more often than not the appropriate standard when comparing two implementations
            of a stochastic social complexity model. When aiming for distributional equiva-
            lence, a set of statistical summaries representative of each output are selected. It is
            these summaries, and not the complete outputs, that will be compared in order to
            assess the similarity between the original computational model and the replicated
            one. As models may produce large amounts of data, the summary measures should
            be chosen as to be relevant to the actual modelling goal. The summaries of all model
            outputs constitute the set of focal measures (FMs) of a model (Wilensky and Rand
            2007), or more specifically, of a model parametrisation (since different FMs may be
            selected for distinct parametrisations). However, this process is empirically driven
            and model-dependent, or even parameter-dependent. Furthermore, it is sometimes
            unclear as to what output features best describe model behaviour. A possible
            solution, presented by Arai and Watanabe (2008) in the context of comparing
            models with different elements, is the automatic extraction of FMs from time-
            series simulation output using the discrete Fourier transform. Fachada et al. (2017)
            proposed a similarly automated method, using principal component analysis to
            convert simulation output into a set of linearly uncorrelated statistical measures,
            analysable in a consistent, model-independent fashion. The proposed method was
            broader in scope—with support for multiple outputs and different types of data—
            and is available in the form of a software package for the R platform (Fachada et al.
            2016; R Core Team 2017).
              Once the FMs are extracted from simulation output, there are three major
            statistical approaches used to compare them: (1) statistical hypothesis tests; (2)
            confidence intervals; and (3) graphical methods (Balci and Sargent 1984). Statistical
            hypothesis tests are often used for comparing two or more computational models
            (Axtell et al. 1996; Wilensky and Rand 2007; Edmonds and Hales 2003; Miodownik
            et al. 2010; Radax and Rengs 2009; Fachada et al. 2017). More specifically,
   192   193   194   195   196   197   198   199   200   201   202