Page 103 - Intermediate Statistics for Dummies
P. 103

09_045206 ch04.qxd  2/1/07  9:49 AM  Page 82
                                82
                                         Part II: Making Predictions by Using Regression
                                                    note that when you square either one of them, you get 0.81, which you
                                                    should also interpret as being high.
                                                                                                                  2
                                                    The following are some general guidelines for interpreting the value of r :
                                                       If the model containing x explains a lot of the variability in the y-values,
                                                             2
                                                        then r is high (in the 80 to 90 percent range is considered to be
                                                        extremely high). Values like 0.70 are still considered fairly high. A high
                                                        percentage of variability means that the line fits well because there is
                                                        not much left to explain about the value of y other than using x and its
                                                                                         2
                                                        relationship to y. So a larger value of r is a good thing.
                                                       If the model containing x doesn’t help much in explaining the difference
                                                                                     2
                                                        in the y-values, then the value of r is small (closer to zero; say between
                                                        0.00 and 0.30 roughly). The model, in this case, would not fit well. You
                                                        need another variable to explain y other than the one you already tried.
                                                                 2
                                                       Values of r that fall in the middle (between, say, 0.30 and 0.70) mean
                                                        that x does help somewhat in explaining y, but it doesn’t do the job well
                                                        enough on its own. In this case, statisticians would try to add one or
                                                        more variables to the model to help explain y more fully as a group (read
                                                        more about this in Chapter 5).
                                                    For the textbook weight example, the value of r (the correlation coefficient)
                                                                                    2
                                                    is 0.93. Squaring this result, you get r = 0.8649. That number means approxi-
                                                    mately 86 percent of the variability you find in average textbook weights for
                                                    all students (y-values) is explained by the average student weight (x-values).
                                                    This percentage tells you that the model of using year in school to estimate
                                                    backpack weight is a good bet.
                                                    In the case of simple linear regression, you have only one x variable, but in
                                                    Chapter 5, you can see models that contain more than one x variable. In this
                                                                    2
                                                    situation, you use r to help sort out the contributions each individual vari-
                                                    able brings to the model.
                                                    Scoping for outliers
                                                    Sometimes life isn’t perfect (oh really?), and you may find a residual in your
                                                    otherwise tidy data set that totally sticks out, which is called an outlier. That
                                                    is, it has a standardized value at or beyond +3 or –3. It threatens to blow the
                                                    conditions of your regression model and send you crying to your professor.
                                                    Before you panic, the best thing to do is to examine that outlier more closely.
                                                    First, can you find an error in that data value? Did someone report her age as
                                                    642, for instance? (After all, mistakes do happen.) If you do find a certifiable
                                                                             @Spy
   98   99   100   101   102   103   104   105   106   107   108