Page 230 - Statistics II for Dummies
P. 230
214 Part III: Analyzing Variance with ANOVA
Instead of calling the sum of squares for the regression model SST as is done
in ANOVA, statisticians call it SSR for sum of squares regression. Consider SSR
to be equivalent to the SST from ANOVA. You need to know the difference
because computer output lists the sums of squares for the regression model
as SSR, not SST.
To summarize the sums of squares as they apply to regression, you have
SSTO = SSR + SSE where
✓ SSTO measures the variability in the observed y-values around their
mean. This value represents the variance of the y-values.
✓ SSE represents the variability between the predicted values for y (the
values on the line) and the observed y-values. SSE represents the vari-
ability left over after the line has been fit to the data.
✓ SSR measures the variability in the predicted values for y (the values on
the line) from the mean of y. SSR is the sum of squares due to the regres-
sion model (the line) itself.
Minitab calculates all the sums of squares for you as part of the regression
analysis. You can see this calculation in the section “Bringing regression to
the ANOVA table.”
Dividing up the degrees of freedom
In ANOVA, you test a model for the treatment (population) means by using an
F-test, which is . To get MST (the mean sum of squares for treatment),
you take SST (the sum of squares for treatment) and divide by its degrees of
freedom. You do the same with MSE (that is, take SSE, the sum of squares for
error, and divide by its degrees of freedom). The questions now are, what do
those degrees of freedom represent, and how do they relate to regression?
Degrees of freedom in ANOVA
In ANOVA, the degrees of freedom for SSTO is n – 1, which represents the
sample size minus one. In the formula for SSTO, , you see there are
n observed y-values minus one mean. In a very general way, that’s where the
n – 1 comes from.
Note that if you divide SSTO by n – 1, you get , the variance in the
y-values. This calculation makes good sense because the variance measures
the total variability in the y-values.
18_466469-ch12.indd 214 7/24/09 9:45:33 AM

