Page 247 -
P. 247
11 How Many Times Should One Run a Computational Simulation? 245
case, Example 1 shows that results are unreliable and one might discard effects that
are, in fact, relevant to the study. At the same time, Example 2 shows that—for
studies with large effect sizes—overpower does not pose too relevant threats to the
overall reliability of a study.
In any case, knowing what makes the ABM more likely to produce reliable
results is a relevant information for modelers. It seems more so when modelers
perform their simulation a limited number of times per configuration of parameters.
But also when too many runs are performed, the absence of power calculations
may mislead one’s judgement on the effects and actual meaning of the simulation.
However, the asymmetry of the effects between under- and over-power suggests
that power analysis can be used to provide, if not a guess, at least a lower guess on
the number of runs (see the concept of SESOI introduced above). The value that is
calculated with the aid of statistical power analysis is a number that—if not taken
at face value—should inform the choice on the number of runs, and could at least
work as a benchmark.
In a review of models published mostly in Computational and Mathematical
Organization Theory (CMOT) and in the Journal of Artificial Societies and Social
Simulation (JASSS) between 2010 and 2013 (Secchi and Seri 2017) it was found
that most models are under-powered. If a small effect size d D 0:1 is hypothesized,
then the average power is 1 ˇ 0:41, while if a medium effect size d D 0:3 is
taken, then power becomes 1 ˇ 0:84 (with ˛ D 0:01). In both cases, the review
shows that models are under-powered even by the milder standards of 1 ˇ D 0:90
suggested in Ritter et al. (2011).
11.5.1 Comparing Statistical Power to Other Approaches
Using power is not the only way in which one can determine the number of runs in
an experimental study and, in particular, in an ABM.
As an example, another approach sometimes called accuracy in parameter
estimation (AIPE) (Maxwell et al. 2008) has been proposed. In this approach,
first the researcher identifies a quantity of interest (a coefficient in a regression, a
correlation, etc.) and chooses the desired width of a confidence interval around this
value. Then, the researcher selects the sample size that allows one to reach this
objective. The technique is already established, under different names, in medicine
(Bland 2009), engineering (Hahn and Meeker 2011, Sect. 8.3), and psychology
(Maxwell et al. 2008). A similar approach, putting together AIPE and power
analysis, has also been proposed in the context of simulation models in Ritter et al.
(2011).
However, we think that, in order to become a feasible option for ABM, this
method should overcome some difficulties. First, AIPE may be surely of interest
whenever the objective of the analysis is to obtain a precise enough measure of the
effect of a treatment (see above for references). However, most ABM studies are not
framed in this way (see the distinction between KISS and KIDS above).