Page 100 - Intermediate Statistics for Dummies
P. 100
09_045206 ch04.qxd 2/1/07 9:49 AM Page 79
Chapter 4: Getting in Line with Simple Linear Regression
other words, you can discover that looking at errors helps you assess the fit of
the model and diagnose problems that caused a bad fit, if that was the case.
Finding the residuals
A residual is the difference between the observed value of y (from the best-
fitting line) and the predicted value of y (from the data set). Specifically, for
any data point, you take its observed y-value (from the data) and subtract the
expected y-value (from the line). If the residual is large, the line doesn’t fit
well in that spot. If the residual is small, the line fits well in that spot.
For example, suppose you have a point in your data set (2, 4) and the equa-
tion of the best-fitting line is y = 2x +1. The expected value of y in this case
is 2 2 + 1 = 5. The observed value of y from the data set is 4. Taking the
*
observed value minus the estimated value you get 4 – 5 = –1. The residual for
that particular data point (2, 4) is –1. If you observe a y-value of 6 and use the
same straight line to estimate y, then the residual would be 6 – 5 = +1.
In general, a positive residual means you underestimated y at that point, and 79
a negative residual means you overestimated y at that point.
Standardizing the residuals
To make interpreting the residuals easier, statisticians typically standardize
them; that is, subtract the mean of the residuals (zero) and divide by the stan-
dard deviation of all the residuals. The residuals are a data set just like any
other data set, so you can find their mean and standard deviation like you
always do. Standardizing just means converting to a Z-score, so you see where
it falls on the standard normal distribution.
Making residual plots
You can plot the residuals on a graph called a residual plot. (If you’ve stan-
dardized the residuals, you call it a standardized residual plot.) Figure 4-4
shows Minitab output for a variety of standardized residual plots, all getting
at the same idea: checking to be sure the conditions of the simple linear
regression model are met.
Checking normality
If the condition of normality is met, you can see on the residual plot lots of
(standardized) residuals close to zero; as you move farther and farther away
from zero, you can see fewer and fewer residuals. Note: A standardized resid-
ual at or beyond +3 or –3 is something you shouldn’t expect to see. If this
occurs, you can consider that point an outlier, which warrants further investi-
gation. (For more on outliers, see the section “Scoping for outliers.”)
The residuals should also occur at random — some above the line, some below
the line. If a pattern occurs in the residuals, the line may not be fitting right.
@Spy