Page 213 - Applied Statistics Using SPSS, STATISTICA, MATLAB and R
P. 213

194      5 Non-Parametric Tests of Hypotheses


              The  r×c contingency table is shown in Figure  5.4. All  samples from the  r
           populations are assumed to be independent and randomly drawn. All observations
           are assumedly categorised into exactly one of c categories. The total number of
           cases is:

              n = n 1 + n 2 + ...+  n r = c 1 + c 2 + ... + c c ,

           where the c j are the column counts, i.e., the total number of observations in the jth
           class:

                   r
              c  j  =  ∑ O .
                      ij
                   = i 1
              Let p ij denote the probability that a randomly selected case of population i is
           from class  j. The  hypotheses formalised for the  r×c  contingency table are a
           generalisation  of the two-sided  hypotheses for the  2×2  contingency table (see
           5.2.1):

             H 0:  For any class, the probabilities are the same for all populations: p 1j = p 2j =
                 … = p rj, ∀j.
             H 1:  There are at least two populations with different probabilities in one class:
                 ∃ i, j,  p ij ≠ p kj.

              The test statistic is also a generalisation of 5.18:

                  r  c  (O  − E  ) 2        n  c
              T  = ∑∑   ij   ij  ,  with   E =  i  j  .                    5.23
                                        ij
                  = i 1  = j 1  E ij         n

              If H 0 is true, we expect the observed counts O ij to be near the expected counts
           E ij, estimated as in the above formula 5.23, using the row and column marginal
           counts. The asymptotic distribution  of  T  is  the  chi-square  distribution  with
           df = (r − 1)(c – 1) degrees of freedom. As with the chi-square goodness of fit test
           described in section 5.1.3, the approximation is considered acceptable if the
           following conditions are met:

              i.  For df = 1, i.e. for 2×2 contingency tables, no E ij must be smaller than 5;
              ii.  For df > 1, no E ij must be smaller than 1 and no more than 20% of the E ij
                 must be smaller than 5.

              The SPSS STATISTICA, MATLAB  and R commands for testing  r×c
           contingency tables are indicated in Commands 5.7.

           Example 5.11
           Q: Consider the male and female populations of the Fre shmen   dataset. Based on
           the evidence provided  by the respective samples, is  it possible to conclude that
   208   209   210   211   212   213   214   215   216   217   218