Page 209 - Applied Statistics Using SPSS, STATISTICA, MATLAB and R
P. 209

190      5 Non-Parametric Tests of Hypotheses


              H 0:  p 1 ≤ p 2,   H 1:  p 1 > p;  or
              H 0:  p 1 ≥ p 2;   H 1:  p 1 < p 2.

              In order to assess the null hypothesis, we use the same goodness of fit measure
           as in formula 5.8, now reflecting the sum of the squared deviations for all four cells
           in the contingency table:

                  2  2  (O  − E  ) 2
              T  = ∑∑    ij  ij  ,                                         5.18
                  = i  =1 j  1  E ij

           where the expected absolute frequencies E ij are estimated as:

                     2
                   n ∑ O ij
                    i
                             i
                                     2
                                      j
                                 j 1
              E =    i 1 =  =  n ( O + O )  ,                              5.19
               ij
                      n          n

           with n = n 1 + n 2  (total number of cases).
              Thus, we estimate the expected counts in each cell as the ratio of the observed
           marginal counts. With these estimates, one can rewrite 5.18 as:

                    n (O  O  −  O  O  ) 2
              T =      11  22  12  21    .                                 5.20
                  n 1 n 2 (O + O 21 )(O +  O 22  )
                        11
                                 12

              The sampling  distribution of  T,  assuming that the null hypothesis is true,
           p 1 = p 2  = p, can be computed by first noticing that the probability of obtaining O 11
           cases of class 1 in a sample of n 1 cases from population 1, is given by the binomial
           law (see A.7):

                        n 1 
              P (O 11 ) =      p O 11 q n 1 O−  11  .
                       O 11 

              Similarly, for the probability of obtaining O 21 cases of class 1 in a sample of n 2
           cases from population 2:

                        n 2 
              P (O 21 ) =        p O 21 q n 2 O−  21  .
                       O 21 

              Because the two samples are independent the probability of the joint event is
           given by:

                            n 1    n 2 
              P (O 11 ,O 21 ) =            p O 11 O+  21 q  n− O 11 O−  21  ,  5.21
                           O 11    O 21 

              The exact values of P(O 11, O 21) are, however, very difficult to compute, except
           for very  small  n 1 and  n 2 (see e.g. Conover,  1980). Fortunately, the asymptotic
           distribution of  T is well approximated by the chi-square distribution  with one
   204   205   206   207   208   209   210   211   212   213   214