Page 245 -
P. 245

11 How Many Times Should One Run a Computational Simulation?    243

              From results in Table 11.2 it is immediately apparent that there are differences
            between the two models. The under-powered Model 5 is not able to detect some
            of the effects that are instead captured by the more balanced Model 40. In fact,
            Model 5 fails to identify the relation between hierarchy with competence (HC) and
            anarchy (AR) as statistically significant as well as the relation between hierarchy
            with incompetence (HI) and anarchy (AR). In other words, the null hypothesis was
            accepted when (probably) false, hence falling into Type-II error. And we know that
            this is the case because a very similar regression coefficient (ˇ HC=AR D 0:007,
            St. err. D 0:003) leads instead to the rejection of the null hypothesis—that the
            corresponding parameter is zero—in Model 40, where it is more reasonable to
            suppose that power requirements are met. The second coefficient—hierarchy with
            incompetence on anarchy—is also statistically significant in Model 40 (ˇ HI=AR D
             0:012,St. err. D 0:003) as opposed to Model 5 (ˇ HI=AR D 0:012,St. err.
            D 0:008).
              At last, note that in Model 5 the F-statistic for the joint nullity of both effects
            does not lead to the rejection of the null hypothesis, thus suggesting that there is no
            effect overall of the structure on problem solving. The conclusion is at odds with
            the one from Model 40, that leads to the strong rejection of the same hypothesis.
              In short, the impact of some of the conditions fails to be acknowledged in
            the under-powered study with only 5 runs, leaving important and interesting
            implications out of the study.



            11.4.3 Example 2


            We also conduct a second example to illustrate the risks and problems of over-
            powering the simulation. In this example, we over-power the simulation and
            calculate results on 500 runs, with the same parameter specifications used in the
            example above.
              Results of the two simulations are explored in Table 11.3, where we show the
            estimation outputs of two OLS regression models. In the table, Model 40 shows
            results for the correctly-powered simulation while Model 500 refers to the over-
            powered simulation. The beta coefficients are very close to each other, with a
            variation that is mostly reflected in the standard errors, that decrease in the case of
            the over-powered simulation. This leads to a different t value so that the respective
            probability (the p-value) becomes closer to zero for Model 500 than for Model 40.
              From the perspective of accepting or rejecting results in the regression, there
            is little or no difference. In fact, most values are well below the threshold for
            statistically significant results. This points at the fact that, if one is interested in
            accepting or rejecting hypotheses, there is no particular difference between the two.
              However, in another article (Secchi and Seri 2017), we warn modelers of the
            risks of over-power. There we write that over-power hides some dangers because it
            might be unnecessarily costly (time consuming, for example), it makes small effects
            as significant as larger ones, and destroys the balance between the two probabilities
   240   241   242   243   244   245   246   247   248   249   250