Page 126 - Statistics II for Dummies
P. 126

110
                       Part II: Using Different Types of Regression to Make Predictions
                                  Assessing the fit of multiple

                                  regression models


                                  For any model selection procedure, assessing the fit of each model being
                                  considered is built into the process. In other words, as you go through all the
                                  possible models, you’re always keeping an eye on how well each model fits.
                                  So before you get into a discussion of how to do the best subsets procedure,
                                  you need criteria to assess how well a particular model fits a data set.

                                  Although there are tons of different statistics for assessing the fit of regres-
                                  sion models, I discuss the most popular ones: R  (simple linear regression
                                                                           2
                                        2
                                  only), R  adjusted, and Mallow’s C-p. All three appear on the bottom line
                                  of the Minitab output when you do any sort of model selection procedure.
                                  Here’s a breakdown of the assessment techniques:
                                       2
                                   ✓ R : R  is the percentage of the variability in the y values that’s explained
                                          2
                                      by the model. It falls between 0 and 100 percent (0 and 1.0). In simple
                                      linear regression (see Chapter 4), a high value of R  means the line fits
                                                                                  2
                                                            2
                                      well, and a low value of R  means the line doesn’t fit well.
                                      When you have multiple regression, however, there’s a bit of a catch
                                      here. As you add more and more variables (no matter how significant),
                                                  2
                                      the value of R  increases or stays the same — it never goes down. This
                                      can result in an inflated measure of how well the model fits. Of course,
                                      statisticians have a fix for the problem, which leads me to the next item
                                      on this list.
                                       2
                                                                            2
                                   ✓ R  adjusted: R  adjusted takes the value of R  and adjusts it downward
                                                  2
                                      according to the number of variables in the model. The higher the
                                      number of variables in the model, the lower the value of R  adjusted will
                                                                                        2
                                      be, compared to the original R .
                                                                2
                                                    2
                                      A high value of R  adjusted means the model you have is fitting the data
                                      very well (the closer to 1, the better). I typically find a value of 0.70 to be
                                                         2
                                      considered okay for R  adjusted, and the higher the better.
                                                 2
                                      Always use R  adjusted rather than the regular R  to assess the fit of a
                                                                                2
                                      multiple regression model. With every addition of a new variable into a
                                                                         2
                                      multiple regression model, the value of R  stays the same or increases. It
                                      will never go down because a new variable will either help explain some
                                      of the variability in the y’s (thereby increasing R  by definition), or it will
                                                                                2
                                      do nothing (leaving R  exactly where it was before). So theoretically, you
                                                        2
                                      could just keep adding more and more variables into the model just for
                                                                      2
                                      the sake of getting a larger value of R .
                                       2
                                      R  adjusted is important because it keeps you from adding more and
                                      more variables by taking into account how many variables there already
                                                                 2
                                      are in the model. The value of R  adjusted can actually decrease if the



                                                                                                       7/23/09   9:27:04 PM
           11_466469-ch06.indd   110                                                                   7/23/09   9:27:04 PM
           11_466469-ch06.indd   110
   121   122   123   124   125   126   127   128   129   130   131