Page 215 - Applied Statistics Using SPSS, STATISTICA, MATLAB and R
P. 215

196      5 Non-Parametric Tests of Hypotheses


           (nominal or ordinal) or continuous. In  this latter case, one must choose suitable
           categorisations for the continuous variables.
              The r×c contingency table for this situation is the same as shown in Figure 5.4.
           The  only differences being that whereas in the previous section the  rows
           represented different populations and the row totals were assumed to be fixed, now
           the rows  represent categories of a  second variable  and  the row  totals can vary
           arbitrarily, constrained only by the fact that their sum is the total number of cases.
              The test is formalised as:

              H 0:  The event “an observation is in row i” is independent of the event “the same
                 observation is in column j”, i.e.:

                 P(row i, column j) = P(row i) ×P(column j), ∀i,j.

              H 1:  The events “an observation is in row i” and “the same observation is in
                 column j”, are dependent, i.e.:
                 ∃ i,j, P(row i, column j) ≠ P(row i)  ×P(column j).

              Let r i denote the row totals as in Figure 2.18, such that:

                  c
              r i  = ∑  O  and   n = r 1 + r 2 + ...+  r r = c 1 + c 2 + ... + c c .
                     ij
                  = j 1

              As before, we use the test statistic:

                  r  c  (O  − E  ) 2        r  c
              T  = ∑∑    ij  ij  ,  with   E =  i  j  ,                    5.24
                                        ij
                  = i 1  = j 1  E ij         n

           which has the asymptotic chi-square distribution with df = (r – 1)(c – 1) degrees of
           freedom. Note, however, that since the row totals can vary in this situation, the
           exact probability associated to a certain value of  T is even more difficult to
           compute than before because there are a greater number of possible tables with the
           same T.

           Example 5.12

           Q: Consider  the  Programming   dataset, containing results of pedagogical
           enquiries made during the  period  1986-1988,  of  freshmen attending the course
           “Programming and Computers” in the Electrotechnical Engineering Department of
           Porto University. Based on the evidence provided by the respective samples, is it
           possible to conclude that the performance obtained  by the students at the  final
           examination is independent of their previous knowledge on programming?
           A: Note that we have a single population with two attributes: “previous knowledge
           on programming” (variable  PROG),  and  “final examination  score” (variable
           SCORE). In order to test the independence hypothesis of these two attributes, we
   210   211   212   213   214   215   216   217   218   219   220