Page 215 - Applied Statistics Using SPSS, STATISTICA, MATLAB and R
P. 215
196 5 Non-Parametric Tests of Hypotheses
(nominal or ordinal) or continuous. In this latter case, one must choose suitable
categorisations for the continuous variables.
The r×c contingency table for this situation is the same as shown in Figure 5.4.
The only differences being that whereas in the previous section the rows
represented different populations and the row totals were assumed to be fixed, now
the rows represent categories of a second variable and the row totals can vary
arbitrarily, constrained only by the fact that their sum is the total number of cases.
The test is formalised as:
H 0: The event “an observation is in row i” is independent of the event “the same
observation is in column j”, i.e.:
P(row i, column j) = P(row i) ×P(column j), ∀i,j.
H 1: The events “an observation is in row i” and “the same observation is in
column j”, are dependent, i.e.:
∃ i,j, P(row i, column j) ≠ P(row i) ×P(column j).
Let r i denote the row totals as in Figure 2.18, such that:
c
r i = ∑ O and n = r 1 + r 2 + ...+ r r = c 1 + c 2 + ... + c c .
ij
= j 1
As before, we use the test statistic:
r c (O − E ) 2 r c
T = ∑∑ ij ij , with E = i j , 5.24
ij
= i 1 = j 1 E ij n
which has the asymptotic chi-square distribution with df = (r – 1)(c – 1) degrees of
freedom. Note, however, that since the row totals can vary in this situation, the
exact probability associated to a certain value of T is even more difficult to
compute than before because there are a greater number of possible tables with the
same T.
Example 5.12
Q: Consider the Programming dataset, containing results of pedagogical
enquiries made during the period 1986-1988, of freshmen attending the course
“Programming and Computers” in the Electrotechnical Engineering Department of
Porto University. Based on the evidence provided by the respective samples, is it
possible to conclude that the performance obtained by the students at the final
examination is independent of their previous knowledge on programming?
A: Note that we have a single population with two attributes: “previous knowledge
on programming” (variable PROG), and “final examination score” (variable
SCORE). In order to test the independence hypothesis of these two attributes, we