Page 247 -
P. 247

11 How Many Times Should One Run a Computational Simulation?    245

            case, Example 1 shows that results are unreliable and one might discard effects that
            are, in fact, relevant to the study. At the same time, Example 2 shows that—for
            studies with large effect sizes—overpower does not pose too relevant threats to the
            overall reliability of a study.
              In any case, knowing what makes the ABM more likely to produce reliable
            results is a relevant information for modelers. It seems more so when modelers
            perform their simulation a limited number of times per configuration of parameters.
            But also when too many runs are performed, the absence of power calculations
            may mislead one’s judgement on the effects and actual meaning of the simulation.
            However, the asymmetry of the effects between under- and over-power suggests
            that power analysis can be used to provide, if not a guess, at least a lower guess on
            the number of runs (see the concept of SESOI introduced above). The value that is
            calculated with the aid of statistical power analysis is a number that—if not taken
            at face value—should inform the choice on the number of runs, and could at least
            work as a benchmark.
              In a review of models published mostly in Computational and Mathematical
            Organization Theory (CMOT) and in the Journal of Artificial Societies and Social
            Simulation (JASSS) between 2010 and 2013 (Secchi and Seri 2017) it was found
            that most models are under-powered. If a small effect size d D 0:1 is hypothesized,
            then the average power is 1   ˇ   0:41, while if a medium effect size d D 0:3 is
            taken, then power becomes 1 ˇ   0:84 (with ˛ D 0:01). In both cases, the review
            shows that models are under-powered even by the milder standards of 1   ˇ D 0:90
            suggested in Ritter et al. (2011).



            11.5.1 Comparing Statistical Power to Other Approaches


            Using power is not the only way in which one can determine the number of runs in
            an experimental study and, in particular, in an ABM.
              As an example, another approach sometimes called accuracy in parameter
            estimation (AIPE) (Maxwell et al. 2008) has been proposed. In this approach,
            first the researcher identifies a quantity of interest (a coefficient in a regression, a
            correlation, etc.) and chooses the desired width of a confidence interval around this
            value. Then, the researcher selects the sample size that allows one to reach this
            objective. The technique is already established, under different names, in medicine
            (Bland 2009), engineering (Hahn and Meeker 2011, Sect. 8.3), and psychology
            (Maxwell et al. 2008). A similar approach, putting together AIPE and power
            analysis, has also been proposed in the context of simulation models in Ritter et al.
            (2011).
              However, we think that, in order to become a feasible option for ABM, this
            method should overcome some difficulties. First, AIPE may be surely of interest
            whenever the objective of the analysis is to obtain a precise enough measure of the
            effect of a treatment (see above for references). However, most ABM studies are not
            framed in this way (see the distinction between KISS and KIDS above).
   242   243   244   245   246   247   248   249   250   251   252