Page 102 - Intermediate Statistics for Dummies
P. 102
09_045206 ch04.qxd 2/1/07 9:49 AM Page 81
Chapter 4: Getting in Line with Simple Linear Regression
middle). The problem again seems to be the residual of –3, which makes the
histogram be skewed to the left.
The lower-right plot of Figure 4-4 plots the residuals in the order presented in
the data set in Table 4-1. Because the data was ordered already, the lower-
right residual plot looks like the upper-right residual plot in Figure 4-4, except
the dots are connected. This lower-right residual plot makes the residual
of –3 stand out even more.
Checking the spread of the y’s for each x
The graph in the upper-right corner of Figure 4-4 also addresses the
homoscedasticity condition. If the condition is met, then the residuals for
every x-value have about the same spread. If you cut a straight line down
through each x-value, the residuals have about the same spread (standard
deviation) each time, except for the last x-value, which again represents
grade twelve. That means the condition of equal spread in the y-values is met
for the backpack example.
If you look at only one residual plot, choose the one in the upper-right corner 81
of Figure 4-4, the plot of the fitted values (the values of y on the line) versus
the standardized residuals. Most problems with model fit pop up on that plot
because a residual is defined as the difference between the observed value
of y and the fitted value of y. In a perfect world, all the fitted values have no
residual at all; a large residual (such as the one where the estimated weight is
20 pounds for twelfth graders; see Figure 4-4) is indicated by a point far off
from zero. This graph also shows you deviations from the overall pattern of
the line; for example, if large residuals are on the extremes of this graph (very
low or very high fitted values), that shows the line isn’t fitting in those areas.
2
Using r to measure model fit
One important way to assess how well the model fits is to measure the value
2
of r , where r is the correlation coefficient. Statisticians measure how well a
model fits by looking at what percentage of the variability in y is explained by
the model.
The y-values of the data you collect have a great deal of variability in and of
themselves. You look for another variable (x) that helps you explain that vari-
ability in the y-values. After you put that x variable into the model, and you
find it’s highly correlated with y, you want to find out how well this model did
at explaining why the values of y are different.
2
As it turns out, the value of r , gives you that measure of model fit. Because
squaring a number between 0 and +1 makes the result get smaller (except for
2
0 and +1), how do you interpret r ? A value of r = +0.9 or –0.9 is quite high;
@Spy