Page 103 - Intermediate Statistics for Dummies
P. 103
09_045206 ch04.qxd 2/1/07 9:49 AM Page 82
82
Part II: Making Predictions by Using Regression
note that when you square either one of them, you get 0.81, which you
should also interpret as being high.
2
The following are some general guidelines for interpreting the value of r :
If the model containing x explains a lot of the variability in the y-values,
2
then r is high (in the 80 to 90 percent range is considered to be
extremely high). Values like 0.70 are still considered fairly high. A high
percentage of variability means that the line fits well because there is
not much left to explain about the value of y other than using x and its
2
relationship to y. So a larger value of r is a good thing.
If the model containing x doesn’t help much in explaining the difference
2
in the y-values, then the value of r is small (closer to zero; say between
0.00 and 0.30 roughly). The model, in this case, would not fit well. You
need another variable to explain y other than the one you already tried.
2
Values of r that fall in the middle (between, say, 0.30 and 0.70) mean
that x does help somewhat in explaining y, but it doesn’t do the job well
enough on its own. In this case, statisticians would try to add one or
more variables to the model to help explain y more fully as a group (read
more about this in Chapter 5).
For the textbook weight example, the value of r (the correlation coefficient)
2
is 0.93. Squaring this result, you get r = 0.8649. That number means approxi-
mately 86 percent of the variability you find in average textbook weights for
all students (y-values) is explained by the average student weight (x-values).
This percentage tells you that the model of using year in school to estimate
backpack weight is a good bet.
In the case of simple linear regression, you have only one x variable, but in
Chapter 5, you can see models that contain more than one x variable. In this
2
situation, you use r to help sort out the contributions each individual vari-
able brings to the model.
Scoping for outliers
Sometimes life isn’t perfect (oh really?), and you may find a residual in your
otherwise tidy data set that totally sticks out, which is called an outlier. That
is, it has a standardized value at or beyond +3 or –3. It threatens to blow the
conditions of your regression model and send you crying to your professor.
Before you panic, the best thing to do is to examine that outlier more closely.
First, can you find an error in that data value? Did someone report her age as
642, for instance? (After all, mistakes do happen.) If you do find a certifiable
@Spy