Page 191 - Intermediate Statistics for Dummies
P. 191
15_045206 ch09.qxd 2/1/07 10:14 AM Page 170
170
Part III: Comparing Many Means with ANOVA
Breaking down the variance
into sums of squares
The first step of the F-test is splitting up the variability in the y variable into
portions that define where the variability is coming from. The term analysis
of variance is a great description for exactly how you conduct a test of k
population means. With the overall goal of testing whether k population (or
treatment) means are equal, you take a random sample from each of the k
populations. You first put all the data together into one big group and mea-
sure how much total variability there is; this variability is called the sums of
squares total, or SSTO. If the data are really diverse, SSTO is large. If the data
are very similar, SSTO is small.
Now the total variability in the combined data set (SSTO) can be split into
two parts:
SST: The variability between the groups, known as the sums of squares
for treatment
SSE: The variability within the groups, known as the sum of squares
for error
This splitting up of the variability in your data results in one of the most
important equalities in ANOVA. That equality is SSTO = SST + SSE.
2
The formula for SSTO is the numerator of the formula for s , the variance of a
2 th
x
single data set, so SSTO= ΣΣ` x ij - j , where i and j represent the j value
th
in the sample from the i population. SSTO represents the total squared dis-
tance between the data values and their overall mean. The formula for SST is
2 th
x
SST = n Σ _ x i - i , where n i is the size of the sample coming from the i pop-
i
ulation. SST represents the total squared distance between the means from 2
each sample and the overall mean. The formula for SSE is SSE = ΣΣ` x ij - x ij ,
th
th
where x ij is the j value in the sample from the i population and x i is the
th
mean of the sample coming from the i population. This formula represents
the total squared distance between the values in each sample and their corre-
sponding sample means. Using algebra, you can show (with some serious
elbow grease) that SSTO = SST + SSE.
The Minitab output for the watermelon seed spitting contest for the four age
groups is shown in Figure 9-3. Under the Source column of the ANOVA table,
you see Factor listed in row one. The factor variable (as described by Minitab)
represents the treatment or population variable. In column three of the Factor
row, you see the SST, which is equal to 89.75. In the Error row (row two), you
locate the SSE in column three, which equals 56.80. In row three (Total), column
three, you see the SSTO, which is 146.55. Using the values of SST, SSE, and
SSTO from the Minitab output, you can verify that SST + SSE = SSTO.