Page 235 -
P. 235

11 How Many Times Should One Run a Computational Simulation?    233

            Table 11.1 Table of possible              H 0 true    H 1 true
            outcomes for a
            Neyman–Pearson test               H 0 chosen  True negative  Type-II error or
                                                                  false negative
                                              H 1 chosen  Type-I error or  True positive
                                                      false positive


            are as extreme or more extreme than the one computed on the sample. The p-value
            is sometimes (erroneously) perceived as a measure of strength of support in the null
            hypothesis. It is however clear that this method does not have the possibility to offer
            anything more than mild support to H 0 , especially because of the absence of an
            hypothesis that holds true when H 0 does not.
              This approach to testing was amended by J. Neyman and E.S. Pearson, who
            modified it to allow for the possibility of decision and action. The new theory starts
            with the introduction of the null hypothesis, H 0 , and the alternative hypothesis, i.e.
            H 1 , that is supposed to be true when H 0 is not. The hypothesis H 0 is generally, but
            not always, associated with the absence of an effect (of one variable on another,
            for example), while H 1 is generally associated with its presence. The researcher
            is uncertain as to whether H 0 or H 1 holds true. The decision between these two
            hypotheses is performed, as in a trial, on the basis of the available data (we will see
                    2
            later how). This leads to a table of possible outcomes, see Table 11.1. The use of
            positive and negative to denote respectively the choice of H 1 and H 0 comes from the
            medical use of the same terms, where they indicate the positive or negative result
            of a medical test. A negative, i.e. a result in which the disease is not detected, can
            be either true or false, when the unobserved true hypothesis coincides or not with
            the choice of the procedure; the same holds true for a positive. A false positive is
            also called, with a more statistical term, a Type-I error, while a false negative is also
            called a Type-II error. These two “sources of error” (Neyman and Pearson 1928,p.
            177) exist whichever method is used to choose between H 0 and H 1 .
              The standard procedure to decide between H 0 and H 1 is to consider a statistic
                                                                         ). The
            T whose distribution is known under H 0 (let us denote the probability as P H 0
            researcher builds an acceptance region A such that, when t belongs to A, then H 0 is
            chosen as the true hypothesis. The possible values of t that are not contained in A
            form a rejection region R. Therefore A and R make up the entire space in which T
                                                    3
            varies and are generally chosen in such a way that :

                                        fT 2 Ag D 1   ˛
                                    P H 0
                                       fT 2 Rg D ˛
                                    P H 0




            2
            The metaphor of the trial has been introduced in Neyman and Pearson (1933, p. 296) but has been
            criticized as misleading in Liu and Stone (2007).
            3 Here 2 means “belongs to,” so that T 2 A means “T belongs to A.”
   230   231   232   233   234   235   236   237   238   239   240