Page 192 - Applied Statistics Using SPSS, STATISTICA, MATLAB and R
P. 192

5.1 Inference on One Population   173


              The runs  test assesses  the null hypothesis of sequence randomness, using the
           sampling distribution of r, given n 1 and n 2. Tables of this sampling distribution can
           be found in the literature.  For large  n 1 or  n 2 (say above  20) the sampling
           distribution of r is well approximated by the normal distribution with the following
           parameters:

                     n
                                      n
                    2 n              2 n   2 ( n  n  − n  − n  )
              µ r  =  1  2  + 1;     σ r 2  =  1  2  1  2  1  2  .          5.2
                   (n 1  + n 2 )      (n 1  + n 2  ) 2  (n 1  + n 2  −  ) 1

              Notice that the number of runs always satisfies, 1 ≤ r ≤ n, with n = n 1  + n 2. The
           null hypothesis is rejected when there are either too few runs (as in Sequence 1) or
           too many runs (as in Sequence 2). For the previous sequences, at a 5% level the
           critical values of r for n 1  = n 2 = 6 are 3 and 11, i.e. the non-critical region of r is
           [4, 10]. We, therefore, reject at 5% level the null hypothesis of randomness for
           Sequence 1 (r = 2) and Sequence 2 (r = 12), and do not reject the null hypothesis
           for Sequence 3 (r = 7).
              The runs  test can be used with  any  sequence of values  and not necessarily
           dichotomous, if previously the values are dichotomised, e.g. using the mean or the
           median.

           Example 5.1

           Q: Consider the noise sequence in the Signal & Noise   dataset (first column)
           generated with the “normal random number” routine of EXCEL with zero mean.
           The sequence has n = 100 noise values. Use the runs test to assess the randomness
           of the sequence.
           A:  We apply the SPSS runs test command,  using  an imposed ( Custom  )
           dichotomization around zero, obtaining an observed two-tailed  significance  of
           p = 0.048. At a 5% level of significance the randomness of the sequence is not
           rejected.  We  may also use the MATLAB or R  runs   function.  We obtain the
           values of Table 5.1. The interval [n low, n up] represents the non critical region. We
           see that the observed number of runs coincides with one of the interval ends.

           Table 5.1. Results obtained with MATLAB or R runs test for the noise data.

                 n 1           n 2          r            n low        n up
                 53           47            41           41           61


           Example 5.2

           Q: Consider the Forest Fires   dataset (see Appendix E), which contains the
           area (ha) of burnt forest in Portugal during the period 1943-1978. Is there evidence
           from this sample, at a 5% significance level, that the area of burnt forest behaves as
           a random sequence?
   187   188   189   190   191   192   193   194   195   196   197