Page 241 -

P. 241

11 How Many Times Should One Run a Computational Simulation? 239

11.4.2 Example 1

We performed a simulation for the ABM version of the GCM (Fioretti and Lomi
2010), using the second version of the two models uploaded on the NetLogo
community platform. The model has three overall conditions—anarchy, hierarchy-
competence, and hierarchy-incompetence—and each of these has four parameter
conﬁgurations, with buck passing Œtrue, false , and postpone Œtrue, false .We
decided to test a simple case, setting both parameters to false. This gives a design of
3 conﬁgurations of parameters (CoP). Each run had 5000 steps as per the original
simulation (Fioretti and Lomi 2010).
Power analysis should be performed before obtaining data from the model to
choose how many times a simulation should be run. To do that, there are a few
elements to determine. First of all, the researcher should choose a certain number G
of conﬁgurations of parameters (also called groups). Then, considering the nature of
the model or previous simulations one should guess a value of the effect size d that,
in the case of ANOVA, is identiﬁed by the letter f (Cohen 1988;Liu 2014). At last,
one should choose a level for ˛ and a corresponding goal for the level of power—
i.e. 1 ˇ—to be achieved. Although the power threshold of 1 ˇ for empirical
research is set at 0:80, some (Secchi and Seri 2014, 2017) argue that it can be set
at 0:95 for simulations, because the control exerted on variables and parameters is
much higher than that usually in place in empirical research. Consistently with this,
also the threshold for ˛ can be set at the more stringent level of 0:01 (Secchi and
Seri 2017). 11
As explained above, the dependent variable is the ratio r ro of decisions by
resolution in relation to those made by oversight. The differences in its average
value across the three CoP can be easily explored by performing a one-way ANOVA
with the null hypothesis that the expected value is the same across conditions. We
set some notation. If G denotes the number of groups/CoP and n the number of
observations per CoP, the sample size N turns out to be N D n G.

11 In an interesting exchange with Bruce Edmonds, we came to realize that this approach might
raise some important issues. One of the concerns is that thresholds do not usually adjust because
the experiment is so well planned that results come out to be extremely clear; that is to say that good
experimental work still accepts or rejects hypotheses at the level ˛ < 0:05 with 1 ˇ 0:80.This
implies that adjustments of these levels for simulation work appears to be arbitrary. Our position on
this critique is that thresholds actually change as it happens in some medical studies, where 1 ˇ
raises to 0:90 (Lakatos 2005), or when we listen to the calls not to interpret the traditional choices
of ˛ levels as absolute from either social scientists (Gigerenzer 2004) or statisticians (Wasserstein
and Lazar 2016). While a complete review of the reasons leading to the traditional choices of ˛ and
ˇ is in Secchi and Seri (2017), the introduction to testing theory above should have made clear that
the fathers of this theory thought of ˛ and ˇ as quantities to be chosen according to the problem at
hand. This justiﬁes our proposals as long as we cannot compare artiﬁcial computational experiment
to real-life experiments because of different variability of observations, observer’s control and role,
and the usual difﬁculty of increasing sample size for empirical experiments.

236 237 238 239 240 241 242 243 244 245 246