Page 114 - Statistics and Data Analysis in Geology
P. 114
Statistics and Data Analysis in Geology - Chapter 4
One aspect that we have not considered, however, is the order in which the
heads appear. We probably would regard a sequence such as
HHHHHHHHHHHTTTTTTTTT
as being very strange, although the probability of obtaining this many heads in 20
trials is the same as in the preceding example. At the other extreme, the regular
alternation of heads and tails
HTHTHTHTHTHTHTHTHTHH
would also appear very unusual to us, although the probability of the number of
heads is unchanged. What arouses our suspicions is not the proportion of heads
but the order in which they appear. We assume that heads and tails will occur at
random; in the two preceding examples, it seems very unlikely that they have.
We can test these sequences for randomness of occurrence by examining the
number of runs. Runs are defined as uninterrupted sequences of the same state.
The first set of trials contains 13 runs, the second only 2, and the third contains
19. Runs in the first sequence shown are underlined:
(Start)
- HTHHTIJTTTHTIJTHTTHHH
1 1 3 4 5 6 7 8 9 10 11 12 13 (End)
We can calculate the probability that a given sequence of runs was created by
the random occurrence of two states (heads and tails, in this example). This is done
by enumerating all possible ways of arranging nl items of state 1 and n2 items of
state 2. The total number of runs in a sequence is denoted U; tables are available
which give critical values of U for specified nl, n2, and level of significance, (x.
However, if nl and n2 each exceed ten, the distribution of U can be closely ap-
proximated by a normal distribution, and we can use tables of the standard normal
variate z for our statistical tests. The expected mean number of runs in a randomly
generated sequence of nl items of state 1 and n2 items of state 2 is
The expected variance in the mean number of runs is
2
CTU = 2nln2(2nlnZ - nl - n2) (4.9)
(n1+ n2I2(n1 + n2 - 1)
By these equations, we can determine the mean number of runs and the stan-
dard error of the mean number of runs in all possible arrangements of nl and n2
items. Having calculated these, we can create a z-test by Equation (4.10), where U
is the observed number of runs:
(4.10)
You will recognize that this is simply Equation (2.37) rewritten to include the runs
statistics. We can formulate a variety of statistical hypotheses which can be tested
with this statistic. For example, we may wish to see if a sequence contains more
186