Page 92 - Statistics II for Dummies
P. 92

76       Part II: Using Different Types of Regression to Make Predictions



                                If you look at only one residual plot, choose the one in the upper-right corner
                                of Figure 4-6, the plot of the fitted values (the values of y on the line) versus the
                                standardized residuals. Most problems with model fit will show up on that plot
                                because a residual is defined as the difference between the observed value of y
                                and the fitted value of y. In a perfect world, all the fitted values have no residual
                                at all; a large residual (such as the one where the estimated textbook weight is
                                20 pounds for students averaging 142 pounds; see Figure 4-1) is indicated by a
                                point far off from zero. This graph also shows you deviations from the overall
                                pattern of the line; for example, if large residuals are on the extremes of this
                                graph (very low or very high fitted values), the line isn’t fitting in those areas.
                                On balance, you can say this line fits well at least for grades 1 through 11.


                                            2
                                Using r  to measure model fit

                                One important way to assess how well the model fits is to use a statistic
                                                                    2
                                called the coefficient of determination, or r . This statistic takes the value of
                                                                                                2
                                the correlation, r, and squares it to give you a percentage. You interpret r  as
                                the percentage of variability in the y variable that’s explained by, or due to,
                                its relationship with the x variable.
                                The y-values of the data you collect have a great deal of variability in and
                                of themselves. You look for another variable (x) that helps you explain that
                                variability in the y-values. After you put that x variable into the model and
                                find that it’s highly correlated with y, you want to find out how well this
                                model did at explaining why the values of y are different.

                                                            2
                                Note that you have to interpret r  using different standards than those for
                                interpreting r. Because squaring a number between –1 and +1 results in a
                                smaller number (except for +1, –1, and 0, which stay the same or switch
                                          2
                                signs), an r  of 0.49 isn’t too bad, because it’s the square of r = 0.7, which is a
                                fairly strong correlation.
                                                                                             2
                                The following are some general guidelines for interpreting the value of r :
                                  ✓ If the model containing x explains a lot of the variability in the y-values,
                                          2
                                    then r  is high (in the 80 to 90 percent range is considered to be
                                    extremely high). Values like 0.70 are still considered fairly high. A high
                                    percentage of variability means that the line fits well because there’s
                                    not much left to explain about the value of y other than using x and its
                                                                     2
                                    relationship to y. So a larger value of r  is a good thing.
                                  ✓ If the model containing x doesn’t help much in explaining the difference
                                                                  2
                                    in the y-values, then the value of r  is small (closer to zero; between, say,
                                    0.00 and 0.30 roughly). The model, in this case, wouldn’t fit well. You
                                    need another variable to explain y other than the one you already tried.
                                              2
                                  ✓ Values of r  that fall in the middle (between, say, 0.30 and 0.70) mean
                                    that x does help somewhat in explaining y, but it doesn’t do the job







          09_466469-ch04.indd   76                                                                   7/24/09   10:20:39 AM
   87   88   89   90   91   92   93   94   95   96   97