Page 136 - Intermediate Statistics for Dummies
P. 136

11_045206 ch06.qxd  2/1/07  9:52 AM  Page 115
                                          Chapter 6: One Step Forward and Two Steps Back: Regression Model Selection
                                                    How well does the model fit?
                                                    The details regarding the formulas used behind the model selection proce-
                                                    dures in this chapter are beyond the scope of this book. However, knowing
                                                    what the procedure is doing and how to interpret the results are what’s most
                                                    important. To assess the fit of any multiple regression model, you can use the
                                                                                2
                                                                             2
                                                    following three techniques: R , R adjusted, and Mallows’s C-p. You can find
                                                    all three on the bottom line of the Minitab output when you do any sort of
                                                    model selection procedure.
                                                    I describe these techniques in the following:
                                                         2
                                                            2
                                                       R : R is the percentage of the variability in the y values that’s explained
                                                        by the model. It falls between 0 percent and 100 percent (0 and 1.0).
                                                        Values closer to 0 mean the model doesn’t do a good job of explaining y.
                                                        Values closer to 1.0 mean the model does an excellent job. Typically, I
                                                                                2
                                                        say that you can consider R values higher than 0.70 to be good.
                                                                    2
                                                         2
                                                                                           2
                                                       R adjusted: R adjusted is the value of R , adjusted down for a higher  115
                                                        number of variables in the model (which makes it much more useful
                                                                               2
                                                                                                2
                                                        than the regular value of R ). A high value of R adjusted means the
                                                        model you have is fitting the data very well. I typically find a value of
                                                                                   2
                                                        0.70 to be considered high for R adjusted.
                                                       Mallow’s C-p: Mallow’s C-p is another measure of how well a model fits.
                                                        It basically looks at how much error is left unexplained by a model with
                                                        k predictor (x) variables compared to the average error left over from
                                                        the full model (with all the x variables) and adjusts it for the number of
                                                        variables in the model. The smaller Mallow’s C-p is, the better. Because
                                                        when it comes to the amount of error in your model, less is more.
                                                               2
                                                                                             2
                                                    Always use R adjusted rather than the regular R to assess the fit of a multi-
                                                    ple regression model. With every addition of a new variable into a multiple
                                                                               2
                                                    regression model, the value of R stays the same or increases; it will never go
                                                    down. That’s because a new variable will either help explain some of the vari-
                                                                                    2
                                                    ability in the y’s (thereby increasing R by definition), or it will do nothing
                                                            2
                                                    (leaving R exactly where it was before). So theoretically, you could just keep
                                                    adding more and more variables into the model just for the sake of getting a
                                                                                  2
                                                                  2
                                                    larger value of R . Here’s why the R adjusted is important: It keeps you from
                                                    adding more and more variables by taking into account how many values are
                                                                                    2
                                                    in the model. This way, the value of R adjusted can actually decrease if the
                                                    added value of the additional variable is outweighed by the number of vari-
                                                    ables in the model. This gives you an idea of how much or how little added
                                                    value you get from a bigger model (bigger isn’t always better).
   131   132   133   134   135   136   137   138   139   140   141