Page 92 - Statistics II for Dummies
P. 92
76 Part II: Using Different Types of Regression to Make Predictions
If you look at only one residual plot, choose the one in the upper-right corner
of Figure 4-6, the plot of the fitted values (the values of y on the line) versus the
standardized residuals. Most problems with model fit will show up on that plot
because a residual is defined as the difference between the observed value of y
and the fitted value of y. In a perfect world, all the fitted values have no residual
at all; a large residual (such as the one where the estimated textbook weight is
20 pounds for students averaging 142 pounds; see Figure 4-1) is indicated by a
point far off from zero. This graph also shows you deviations from the overall
pattern of the line; for example, if large residuals are on the extremes of this
graph (very low or very high fitted values), the line isn’t fitting in those areas.
On balance, you can say this line fits well at least for grades 1 through 11.
2
Using r to measure model fit
One important way to assess how well the model fits is to use a statistic
2
called the coefficient of determination, or r . This statistic takes the value of
2
the correlation, r, and squares it to give you a percentage. You interpret r as
the percentage of variability in the y variable that’s explained by, or due to,
its relationship with the x variable.
The y-values of the data you collect have a great deal of variability in and
of themselves. You look for another variable (x) that helps you explain that
variability in the y-values. After you put that x variable into the model and
find that it’s highly correlated with y, you want to find out how well this
model did at explaining why the values of y are different.
2
Note that you have to interpret r using different standards than those for
interpreting r. Because squaring a number between –1 and +1 results in a
smaller number (except for +1, –1, and 0, which stay the same or switch
2
signs), an r of 0.49 isn’t too bad, because it’s the square of r = 0.7, which is a
fairly strong correlation.
2
The following are some general guidelines for interpreting the value of r :
✓ If the model containing x explains a lot of the variability in the y-values,
2
then r is high (in the 80 to 90 percent range is considered to be
extremely high). Values like 0.70 are still considered fairly high. A high
percentage of variability means that the line fits well because there’s
not much left to explain about the value of y other than using x and its
2
relationship to y. So a larger value of r is a good thing.
✓ If the model containing x doesn’t help much in explaining the difference
2
in the y-values, then the value of r is small (closer to zero; between, say,
0.00 and 0.30 roughly). The model, in this case, wouldn’t fit well. You
need another variable to explain y other than the one you already tried.
2
✓ Values of r that fall in the middle (between, say, 0.30 and 0.70) mean
that x does help somewhat in explaining y, but it doesn’t do the job
09_466469-ch04.indd 76 7/24/09 10:20:39 AM