Page 229 - Statistics II for Dummies
P. 229
Chapter 12: Regression and ANOVA: Surprise Relatives! 213
Partitioning variability by using SSTO, SSE, and SST for ANOVA
ANOVA is all about partitioning the total variability in the y-values into sums
of squares (find all the info you ever need on one-way ANOVA in Chapter
9). The key idea is that SSTO = SST + SSE, where SSTO is the total variability
in the y-values; SST measures the variability explained by the model (also
known as the treatment, or x variable in this case); and SSE measures the
variability due to error (what’s left over after the model is fit).
Following are the corresponding formulas for SSTO, SSE, and SST, where
is the mean of the y’s, y is each observed value of y, and ˆ y is each predicted
i i
value of y from the ANOVA model:
SSTO =
SSE =
SST =
Use these formulas to calculate the sums of squares for ANOVA. (Minitab
does this for you when it performs ANOVA.) Keep these values of SSTO, SST,
and SSE. You’ll use them to compare to the results from regression.
Finding sums of squares for regression
In regression, you measure the deviations in the y-values by taking each y
i
minus its mean, . Square each result and add them all up, and you have
SSTO. Next, take the residuals, which represent the difference between each
y and its estimated value from the model, ˆ y . Square the residuals and add
i i
them up, and you get the formula for SSE.
After you calculate SSTO and SSE, you need the bridge between them — that
is, you need a formula that connects the variability in the y ’s (SSTO) and
i
the variability in the residuals after fitting the regression line (SSE). That
bridge is called the sum of squares for regression, or SSR (equivalent to SST in
ANOVA). In regression, ˆ y represents the predicted value of y based on the
i i
regression model. These are the values on the regression line. To assess how
much this regression line helps to predict the y-values, you compare it to the
model you’d get without any x variable in it.
Without any other information, the only thing you can do to predict y is look at
the average, . So, SSR compares the predicted value from the regression line
to the predicted value from the flat line (the mean of the y’s) by subtracting
them. The result is . Square each result and sum them all up, and you
get the formula for SSR, which is the same as the formula for SST in ANOVA.
Voilà!
18_466469-ch12.indd 213 7/24/09 9:45:30 AM

