Page 208 - Applied Statistics Using SPSS, STATISTICA, MATLAB and R

P. 208

5.2 Contingency Tables 189

fewer mistakes when the data was generated by asymmetric distributions, namely
the lognormal or exponential distribution. Taking into account these observations
the reader should keep in mind that the statements “a data sample can be well
modelled by the normal distribution” and a “data sample comes from a population
with a normal distribution” mean entirely different things.

5.2 Contingency Tables

Contingency tables were introduced in section 2.2.3 as a means of representing
multivariate data. In sections 2.3.5 and 2.3.6, some measures of association
computed from these tables were also presented. In this section, we describe tests
of hypotheses concerning these tables.

5.2.1 The 2×2 Contingency Table

The 2×2 contingency table is a convenient formalism whenever one has two
random and independent samples obtained from two distinct populations whose
cases can be categorised into two classes, as shown in Figure 5.3. The sample sizes
are n 1 and n 2 and the observed occurrence counts are the O ij.
This formalism is used when one wants to assess whether, based on the samples,
one can conclude that the probability of occurrence of one of the classes is
different for the two populations. It is a quite useful formalism, namely in clinical
research, when one wants to assess whether a specific treatment is beneficial; then,
the populations correspond to “without” and “with” the treatment.

Class 1 Class 2

Population 1 O 11 O 12 n 1

Population 2 O 21 O 22 n 2

Figure 5.3. The 2×2 contingency table with the sample sizes (n 1 and n 2) and the
observed absolute frequencies (counts O ij).

Let p 1 and p 2 denote the probabilities of occurrence of one of the classes, e.g.
class 1, for the populations 1 and 2, respectively. For the two-sided test, the
hypotheses are:

H 0: p 1 = p 2;
H 1: p 1 ≠ p 2.

The one-sided test is formalised as:

203 204 205 206 207 208 209 210 211 212 213