Page 248 -
P. 248
246 R. Seri and D. Secchi
The reason is that ABM studies are often simplified representations of reality.
Therefore, the effect of a treatment is rarely their desired outcome, as it is clear that
the value obtained from an ABM will generally not be the same value observed in
reality. Second, even when the outcome of an ABM study is of interest in itself, it is
rarely the case that one has a precise idea of what the width of a confidence interval
should be. This may be different whenever the outcome variable is measured on
a well-known scale, as it is often the case in the disciplines in which AIPE is an
established alternative to power analysis. The paper (Schönbrodt and Perugini 2013)
(see also Lakens and Evers 2014) provides an interesting example, based on Cohen
(1988), of how to determine the width of an interval, but this seems difficult to
generalize to other situations.
11.5.2 Concluding Remarks
The message of this article is that statistical power analysis can help modelers to
refine their ideas on how many times their ABM simulation should be performed.
In this chapter, we first wrote a few notes on the importance of determining the
number of runs, and then turned our attention to the type of models that would
benefit the most from this approach. The focus is then moved to testing theory so that
we could provide an appropriate statistical background for this approach. Finally,
some practical examples show the risks and perils of under- or over-estimating the
number of runs in a simulation. The implications are then further discussed at the
beginning of this section.
As a way to provide a summary of this chapter and, at the same time, help
modelers clarify what under- and over-power imply, Table 11.4 shows calculations
of power for ˛ D 0:01 and 1 ˇ D 0:95, using the formula that we developed and
also appearing in the Appendix.
The left column in Table 11.4 shows the hypothetical number of parameter
configurations (or groups G) that a potential ABM could have. Knowing how to
determine the appropriate number of configurations is a complex issue that falls
beyond the scope of this chapter. However, sensitivity and steady state analyses
can provide sound support (Thiele et al. 2015). The table calculates the number
of runs that are necessary to reach 1 ˇ D 0:95 at ˛ D 0:01 for five different
effect sizes, respectively ultra-micro D 0:01, micro (0:05), small (0:1), medium
(0:2), large (0:4), and huge (0:8). Results from these calculations confirm with more
granularity of details that small simulations, with few configurations of parameters
(up to 10) need to be performed many times unless the effect size is large or very
large. As the number of configurations grows, the number of runs to perform clearly
decreases significantly to the point where one run per configuration is enough when
variability is spread to its limits (from 1000 and up) in the presence of large and
very large effect sizes.