Page 241 -
P. 241
11 How Many Times Should One Run a Computational Simulation? 239
11.4.2 Example 1
We performed a simulation for the ABM version of the GCM (Fioretti and Lomi
2010), using the second version of the two models uploaded on the NetLogo
community platform. The model has three overall conditions—anarchy, hierarchy-
competence, and hierarchy-incompetence—and each of these has four parameter
configurations, with buck passing Œtrue, false , and postpone Œtrue, false .We
decided to test a simple case, setting both parameters to false. This gives a design of
3 configurations of parameters (CoP). Each run had 5000 steps as per the original
simulation (Fioretti and Lomi 2010).
Power analysis should be performed before obtaining data from the model to
choose how many times a simulation should be run. To do that, there are a few
elements to determine. First of all, the researcher should choose a certain number G
of configurations of parameters (also called groups). Then, considering the nature of
the model or previous simulations one should guess a value of the effect size d that,
in the case of ANOVA, is identified by the letter f (Cohen 1988;Liu 2014). At last,
one should choose a level for ˛ and a corresponding goal for the level of power—
i.e. 1 ˇ—to be achieved. Although the power threshold of 1 ˇ for empirical
research is set at 0:80, some (Secchi and Seri 2014, 2017) argue that it can be set
at 0:95 for simulations, because the control exerted on variables and parameters is
much higher than that usually in place in empirical research. Consistently with this,
also the threshold for ˛ can be set at the more stringent level of 0:01 (Secchi and
Seri 2017). 11
As explained above, the dependent variable is the ratio r ro of decisions by
resolution in relation to those made by oversight. The differences in its average
value across the three CoP can be easily explored by performing a one-way ANOVA
with the null hypothesis that the expected value is the same across conditions. We
set some notation. If G denotes the number of groups/CoP and n the number of
observations per CoP, the sample size N turns out to be N D n G.
11 In an interesting exchange with Bruce Edmonds, we came to realize that this approach might
raise some important issues. One of the concerns is that thresholds do not usually adjust because
the experiment is so well planned that results come out to be extremely clear; that is to say that good
experimental work still accepts or rejects hypotheses at the level ˛ < 0:05 with 1 ˇ 0:80.This
implies that adjustments of these levels for simulation work appears to be arbitrary. Our position on
this critique is that thresholds actually change as it happens in some medical studies, where 1 ˇ
raises to 0:90 (Lakatos 2005), or when we listen to the calls not to interpret the traditional choices
of ˛ levels as absolute from either social scientists (Gigerenzer 2004) or statisticians (Wasserstein
and Lazar 2016). While a complete review of the reasons leading to the traditional choices of ˛ and
ˇ is in Secchi and Seri (2017), the introduction to testing theory above should have made clear that
the fathers of this theory thought of ˛ and ˇ as quantities to be chosen according to the problem at
hand. This justifies our proposals as long as we cannot compare artificial computational experiment
to real-life experiments because of different variability of observations, observer’s control and role,
and the usual difficulty of increasing sample size for empirical experiments.