Page 245 -

P. 245

11 How Many Times Should One Run a Computational Simulation? 243

From results in Table 11.2 it is immediately apparent that there are differences
between the two models. The under-powered Model 5 is not able to detect some
of the effects that are instead captured by the more balanced Model 40. In fact,
Model 5 fails to identify the relation between hierarchy with competence (HC) and
anarchy (AR) as statistically signiﬁcant as well as the relation between hierarchy
with incompetence (HI) and anarchy (AR). In other words, the null hypothesis was
accepted when (probably) false, hence falling into Type-II error. And we know that
this is the case because a very similar regression coefﬁcient (ˇ HC=AR D 0:007,
St. err. D 0:003) leads instead to the rejection of the null hypothesis—that the
corresponding parameter is zero—in Model 40, where it is more reasonable to
suppose that power requirements are met. The second coefﬁcient—hierarchy with
incompetence on anarchy—is also statistically signiﬁcant in Model 40 (ˇ HI=AR D
0:012,St. err. D 0:003) as opposed to Model 5 (ˇ HI=AR D 0:012,St. err.
D 0:008).
At last, note that in Model 5 the F-statistic for the joint nullity of both effects
does not lead to the rejection of the null hypothesis, thus suggesting that there is no
effect overall of the structure on problem solving. The conclusion is at odds with
the one from Model 40, that leads to the strong rejection of the same hypothesis.
In short, the impact of some of the conditions fails to be acknowledged in
the under-powered study with only 5 runs, leaving important and interesting
implications out of the study.

11.4.3 Example 2

We also conduct a second example to illustrate the risks and problems of over-
powering the simulation. In this example, we over-power the simulation and
calculate results on 500 runs, with the same parameter speciﬁcations used in the
example above.
Results of the two simulations are explored in Table 11.3, where we show the
estimation outputs of two OLS regression models. In the table, Model 40 shows
results for the correctly-powered simulation while Model 500 refers to the over-
powered simulation. The beta coefﬁcients are very close to each other, with a
variation that is mostly reﬂected in the standard errors, that decrease in the case of
the over-powered simulation. This leads to a different t value so that the respective
probability (the p-value) becomes closer to zero for Model 500 than for Model 40.
From the perspective of accepting or rejecting results in the regression, there
is little or no difference. In fact, most values are well below the threshold for
statistically signiﬁcant results. This points at the fact that, if one is interested in
accepting or rejecting hypotheses, there is no particular difference between the two.
However, in another article (Secchi and Seri 2017), we warn modelers of the
risks of over-power. There we write that over-power hides some dangers because it
might be unnecessarily costly (time consuming, for example), it makes small effects
as signiﬁcant as larger ones, and destroys the balance between the two probabilities

240 241 242 243 244 245 246 247 248 249 250