Page 227 - Statistics II for Dummies
P. 227

Chapter 12: Regression and ANOVA: Surprise Relatives!  211


                                Assessing the fit of the regression model


                                Before you go ahead and use a regression model to make predictions for y
                                based on an x variable, you must first assess the fit of your model. You can
                                                                        2
                                do this with a scatterplot and correlation or R .
                                Using a scatterplot and correlation
                                One way to get a rough idea of how well your regression model fits is by
                                using a scatterplot, which is a graph showing all the pairs of data plotted in
                                the x-y plane. Use the scatterplot to see whether the data appear to fall in the
                                pattern of a line. If the data appear to follow a straight-line pattern (or even
                                something close to that — anything but a curve or a scattering of points that
                                has no pattern at all), you calculate the correlation, r, to see how strong the
                                linear relationship between x and y is. The closer r is to +1 or –1, the stronger
                                the relationship; the closer r is to zero, the weaker the relationship. Minitab
                                can do scatterplots and correlations for you; see Chapter 4 for more on
                                simple linear regression, including making a scatterplot and finding the
                                value of r.

                                If the data don’t have a significant correlation and/or the scatterplot doesn’t
                                look linear, stop the analysis; you can’t go further to find a line that fits a rela-
                                tionship that doesn’t exist.

                                Using R 2
                                The more general way of assessing not only the fit of a simple linear regres-
                                                                           2
                                sion model but many other models too is to use R , also known as the coef-
                                ficient of determination. (For example, you can use this method in multiple,
                                nonlinear, and logistic regression models in Chapters 5, 7, and 8, to name a
                                                                         2
                                few.) In simple linear regression, the value of R  (as indicated by Minitab and
                                statisticians as a capital R squared) is equal to the square of the Pearson cor-
                                relation coefficient, r (indicated by Minitab and statisticians by a small r). In
                                                   2
                                all other situations, R  provides a more general measure of model fit. (Note
                                that r only measures the fit of a straight-line relationship between one x vari-
                                                                                         2
                                able and one y variable; see Chapter 4.) An even better statistic, R  adjusted,
                                         2
                                modifies R  to account for the number of variables in the model. (For more
                                               2
                                information on R  and its use and interpretation, see Chapter 6.)
                                             2
                                The value of R  adjusted for the model of using education to estimate Internet
                                use (see Figure 12-1) is equal to 41 percent. This value reflects the percentage
                                of variability in Internet use that can be explained by a person’s years of edu-
                                cation. This number isn’t close to one, but note that r, the square root of 41
                                percent, is 0.64, which in the case of linear regression indicates a moderate
                                relationship.












          18_466469-ch12.indd   211                                                                   7/24/09   9:45:29 AM
   222   223   224   225   226   227   228   229   230   231   232