Page 114 - Statistics and Data Analysis in Geology
P. 114

Statistics and Data Analysis in  Geology - Chapter 4

                 One aspect that we have not considered, however, is the order in which the
             heads appear. We probably would regard a sequence such as
                                HHHHHHHHHHHTTTTTTTTT
             as being very strange, although the probability of  obtaining this many heads in 20
             trials is the same as in the preceding example. At  the other extreme, the regular
             alternation of  heads and tails
                                HTHTHTHTHTHTHTHTHTHH

             would also appear very unusual to us, although the probability of  the number of
             heads is unchanged. What arouses our suspicions is not the proportion of  heads
             but the order in which they appear. We  assume that heads and tails will occur at
             random; in the two preceding examples, it seems very unlikely that they have.
                 We  can test these sequences for randomness of  occurrence by examining the
             number of  runs.  Runs are defined as uninterrupted sequences of  the same state.
             The first set of  trials contains 13 runs, the second only 2, and the third contains
             19. Runs in the first sequence shown are underlined:

                   (Start)
                        - HTHHTIJTTTHTIJTHTTHHH
                        1  1    3   4  5    6    7  8  9  10  11    12   13   (End)

                 We  can calculate the probability that a given sequence of runs was created by
             the random occurrence of  two states (heads and tails, in this example). This is done
             by enumerating all possible ways of  arranging nl items of  state 1 and n2  items of
             state 2. The total number of  runs in a sequence is denoted U; tables are available
             which give critical values of  U for specified nl, n2, and level of  significance, (x.
             However, if  nl and n2  each exceed ten, the distribution of  U can be  closely ap-
             proximated by a normal distribution, and we can use tables of the standard normal
             variate z for our statistical tests. The expected mean number of runs in a randomly
             generated sequence of  nl items of  state 1 and n2  items of  state 2 is





             The expected variance in the mean number of runs is

                                      2
                                     CTU  =  2nln2(2nlnZ - nl - n2)                 (4.9)
                                           (n1+ n2I2(n1 + n2  - 1)
                 By these equations, we can determine the mean number of  runs and the stan-
             dard error of  the mean number of runs in all possible arrangements of nl and n2
             items. Having calculated these, we can create a z-test by Equation (4.10), where U
             is the observed number of  runs:

                                                                                   (4.10)


             You will recognize that this is simply Equation (2.37) rewritten to include the runs
              statistics. We  can formulate a variety of  statistical hypotheses which can be tested
             with this statistic.  For example, we  may wish to see if a sequence contains more

              186
   109   110   111   112   113   114   115   116   117   118   119