Page 163 - Applied Statistics Using SPSS, STATISTICA, MATLAB and R
P. 163
4.5 Inference on More than Two Populations 143
4.5.2 One-Way ANOVA
4.5.2.1 Test Procedure
The one-way ANOVA test is applied when only one grouping variable is present in
the dataset, i.e., one has available c independent samples, corresponding to c
categories (or levels) of an effect and wants to assess whether or not the null
hypothesis should be rejected. As an example, one may have three independent
samples of scores obtained by students in a certain course, corresponding to three
different teaching methods, and want to assess whether or not the hypothesis of
equality of student performance should be rejected. In this case, we have an effect
– teaching method – with three categories.
A basic assumption for the variable X being tested is that the c independent
samples are obtained from populations where X is normally distributed and with
equal variance. Thus, the only possible difference among the populations refers to
the means, µ i. The equality of variance tests were already described in section
4.4.2. As to the normality assumption, if there are no “a priori” reasons to accept it,
one can resort to goodness of fit tests described in the following chapter.
In order to understand the ANOVA approach, we start by considering a single
sample of size n, subdivided in c subsets of sizes n 1, n 2, …, n c, with averages
x , x ,K x , k , and investigate how the total variance, v, can be expressed in terms
1
2
of the subset variances, v i. Let any sample value be denoted x ij, the first index
referring to the subset, i = 1, 2, …, c, and the second index to the case number
inside the subset, j = 1, 2, …, n i. The total variance is related to the total sum of
squares, SST, of the deviations from the global sample mean, x :
c n i
2
SST = ∑∑ x( ij − x) . 4.22
= i 1 = j 1
Adding and subtracting x to the deviations, x − , we derive:
x
i
ij
c n i c n i c n i
SST = ∑∑ x( ij − x ) 2 + ∑∑ x( i − x) 2 − 2 ∑∑ x( ij − x )( x i − x) . 4.23
i
i
= i 1 = j 1 = i 1 = j 1 = i 1 = j 1
The last term can be proven to be zero. Let us now analyse the other two terms.
The first term is called the within-group (or within-class) sum of squares, SSW,
and represents the contribution to the total variance of the errors due to the random
scattering of the cases around their group means. This also represents an error term
due to the scattering of the cases, the so-called experimental error or error sum of
squares, SSE.
The second term is called the between-group (or between-class) sum of squares,
SSB, and represents the contribution to the total variance of the deviations of the
group means from the global mean.
Thus:
SST = SSW + SSB. 4.24