Page 192 - Applied Statistics Using SPSS, STATISTICA, MATLAB and R
P. 192
5.1 Inference on One Population 173
The runs test assesses the null hypothesis of sequence randomness, using the
sampling distribution of r, given n 1 and n 2. Tables of this sampling distribution can
be found in the literature. For large n 1 or n 2 (say above 20) the sampling
distribution of r is well approximated by the normal distribution with the following
parameters:
n
n
2 n 2 n 2 ( n n − n − n )
µ r = 1 2 + 1; σ r 2 = 1 2 1 2 1 2 . 5.2
(n 1 + n 2 ) (n 1 + n 2 ) 2 (n 1 + n 2 − ) 1
Notice that the number of runs always satisfies, 1 ≤ r ≤ n, with n = n 1 + n 2. The
null hypothesis is rejected when there are either too few runs (as in Sequence 1) or
too many runs (as in Sequence 2). For the previous sequences, at a 5% level the
critical values of r for n 1 = n 2 = 6 are 3 and 11, i.e. the non-critical region of r is
[4, 10]. We, therefore, reject at 5% level the null hypothesis of randomness for
Sequence 1 (r = 2) and Sequence 2 (r = 12), and do not reject the null hypothesis
for Sequence 3 (r = 7).
The runs test can be used with any sequence of values and not necessarily
dichotomous, if previously the values are dichotomised, e.g. using the mean or the
median.
Example 5.1
Q: Consider the noise sequence in the Signal & Noise dataset (first column)
generated with the “normal random number” routine of EXCEL with zero mean.
The sequence has n = 100 noise values. Use the runs test to assess the randomness
of the sequence.
A: We apply the SPSS runs test command, using an imposed ( Custom )
dichotomization around zero, obtaining an observed two-tailed significance of
p = 0.048. At a 5% level of significance the randomness of the sequence is not
rejected. We may also use the MATLAB or R runs function. We obtain the
values of Table 5.1. The interval [n low, n up] represents the non critical region. We
see that the observed number of runs coincides with one of the interval ends.
Table 5.1. Results obtained with MATLAB or R runs test for the noise data.
n 1 n 2 r n low n up
53 47 41 41 61
Example 5.2
Q: Consider the Forest Fires dataset (see Appendix E), which contains the
area (ha) of burnt forest in Portugal during the period 1943-1978. Is there evidence
from this sample, at a 5% significance level, that the area of burnt forest behaves as
a random sequence?