Page 162 - Applied Statistics Using SPSS, STATISTICA, MATLAB and R

P. 162

142 4 Parametric Tests of Hypotheses

the alternative hypothesis being that there is at least one pair with unequal means,
µ i ≠ µ j.
c
We now assume that H 0 is assessed using two-means tests for all ( ) pairs of
2
the c means. Moreover, we assume that every two-means test is performed at a
95% confidence level, i.e., the probability of not rejecting the null hypothesis when
true, for every two-means comparison, is 95%:

( P µ i = µ j | H 0ij ) = . 0 95 , 4.18

where H 0ij is the null hypothesis for the two-means test referring to the i and j
samples.
The probability of rejecting the null hypothesis 4.17 for the c means, when it is
true, is expressed as follows in terms of the two-means tests:

α = ( P reject H 0 | H 0 ) . 4.19
= ( P µ ≠ µ 2 | H 0 or µ ≠ µ 3 | H 0 orK or µ c− 1 ≠ µ c | H 0 )
1
1

Assuming the two-means tests are independent, we rewrite 4.19 as:

α = 1− ( P µ = µ 2 | H 0 )P (µ = µ 3 | H 0 )K ( P µ c− 1 = µ c | H 0 ) . 4.20
1
1

Since H 0 is more restrictive than any H 0ij, as it implies conditions on more than
two means, we have P (µ ≠ µ j | H 0ij ) ≥ P (µ ≠ µ j | H 0 ) , or, equivalently,
i
i
( P µ = µ j | H 0ij ) ≤ (µ = µ j | H 0 ) .
P
i
i
Thus:

α ≥ 1− ( P µ 1 = µ 2 | H 012 )P (µ 1 = µ 3 | H 013 )K ( P µ c− 1 = µ c | H 0c− , 1 c ) . 4.21

For instance, for c = 3, using 4.18 and 4.21, we obtain a Type I Error
3
α ≥ 1−0.95 = 0.14. For higher values of c the Type I Error degrades rapidly.
Therefore, we need an approach that assesses the null hypothesis 4.17 in a “global”
way, instead of assessing it using individual two-means tests.
In the following sections we describe the analysis of variance (ANOVA)
approach, which provides a suitable methodology to test the “global” null
hypothesis 4.17. We only describe the ANOVA approach for one or two grouping
variables (effects or factors). Moreover, we only consider the so-called “fixed
factors” model, i.e., we only consider making inferences on several fixed
categories of a factor, observed in the dataset, and do not approach the problem of
having to infer to more categories than the observed ones (the so called “random
factors” model).

157 158 159 160 161 162 163 164 165 166 167