Page 237 -
P. 237

11 How Many Times Should One Run a Computational Simulation?    235

            contains the most extreme values of T, i.e. the tails of its distribution. Most users
            of statistics stop here, and perform a test verifying whether t belongs to R or to A
            and, as a consequence, respectively reject H 0 or fail to reject it, as part of a ritual
            (Gigerenzer 2004). In this situation, an alternative way of reaching the same result
            is to compare the p-value, whenever defined, to a fixed threshold ˛:ifthe p-value is
            smaller than ˛, we reject H 0 , otherwise we fail to reject it.
              It is interesting to review the relations among the quantities seen until now. We
            saw before that the effect size d has an impact on ˇ. Since d is a measure of how easy
            it is to discriminate between H 0 and H 1 , it is generally the case that power, 1   ˇ,
                                       6
            increases with d when ˛ is fixed. Another factor affecting ˛ and ˇ is the sample
            size N. In this case too, 1 ˇ generally increases with N, when ˛ is fixed. At last, the
                                            fT 2 Ag D ˇ show that there is a trade-off
            formulas P H 0  fT 2 Ag D 1   ˛ and P H 1
            between ˛ and ˇ. Indeed, when A gets larger, ˛ decreases while ˇ increases, and
            vice versa. This explains why, when N and d are fixed, it is not possible to reduce ˛
            without consequences on the Type-II error rate ˇ. 7
              This is the reason why one cannot make ˛ as small as possible, that is because
            this inflates ˇ. This fact suggests that good results could be achieved by balancing
                                                                          8
            the two error rates. This was indeed proposed by Neyman and Pearson in 1933, and
            has been revived several times since then. A more recent attempt in this direction
            is the compromise power analysis of Erdfelder (1984). However, the most common
            approach is to consider the two sources of error differently.
              A first approach completely disregards ˇ:avaluefor ˛ is rigorously fixed (often
            as ˛ D 0:05), and the test checks whether t belongs to A or not using a sample
            whose size N has been selected without reference to ˇ. This approach is the one that
            most closely resembles the original Fisher paradigm, as the alternative hypothesis
            has practically no role in it. It is based on the fact that, as N increases, ˇ goes to
            0, so that a large sample size guarantees that ˇ will be small enough. A second
            approach supplements this part of the analysis with the computation of power using
            a value of d estimated on the basis of the data, a procedure called post hoc power
            analysis. Because of the large variability of the estimated effect size, this approach
            is generally regarded with suspicion by statisticians (Korn 1990; Hoenig and Heisey
            2001). In the third approach, the researcher fixes ˛ and ˇ, hypothesizes a value of d,
                                                                fT 2 Ag D ˇ .d/
            and chooses A and N so that both P H 0  fT 2 Ag D 1   ˛ and P H 1
            hold true. This procedure, called a priori power analysis, guarantees that, if d is
            correctly guessed, the desired values of ˛ and ˇ will be achieved.





            6 This also explains why in some cases it is possible to increase the power of a test by designing
            an experiment in which it is expected that the effect size d, if not null, is large. As an example,
            in ABM this could be done by setting some of the quantities entering the model to their extreme
            values.
            7
            See also van der Vaart (2000, p. 213) or Choirat and Seri (2012, Proposition 7, p. 285).
            8
            The authors say: “The use of these statistical tools in any given case, in determining just how the
            balance should be struck, must be left to the investigator” (Neyman and Pearson 1933, p. 296).
   232   233   234   235   236   237   238   239   240   241   242