Page 147 - Intermediate Statistics for Dummies
P. 147

11_045206 ch06.qxd  2/1/07  9:52 AM  Page 126
                               126
                                         Part II: Making Predictions by Using Regression
                                                    these results, it would be easy to have a major freak out over which one to
                                                    pick, but never fear — Mallow’s is here (along with his friendly sidekick, the
                                                     2
                                                    R adjusted).
                                                    Looking at Figure 6-5, you see that as the number of variables in the model
                                                             2
                                                    increase, R adjusted peaks out and then drops way off. That’s because R
                                                    adjusted takes into account the number of variables in the model and reduces
                                                     2
                                                                                 2
                                                    R accordingly. You can see that R adjusted peaks out at a level of 74.1 per-
                                                    cent for two models. The corresponding models are the top two-variable
                                                    model (right leg strength and overall leg strength) and the best three-variable
                                                    model (right foot strength, right foot flexibility, and overall leg strength).
                                                    Now look at Mallow’s C-p for these two models. Notice that Mallow’s C-p is 0
                                                    for the two-variable model and 1.3 for the three-variable model. Both values
                                                    are small compared to others in Figure 6-5, but because Mallow’s C-p is
                                                    smaller for the two-variable model and because it has one less variable in it,
                                                    you should choose the two-variable model (right leg strength and overall leg
                                                    strength) as the final model, using the best subsets procedure.  2
                                         Comparing Model Selection Procedures
                                                    Upon examining the results of the previous sections, the first concern you
                                                    may have is why you don’t get the same results with all three model selection
                                                    procedures. (I suppose one could argue that if you got the same results all
                                                    the time, you would have no need for three different procedures, right? But
                                                    that’s beside the point.) All attempts at humor aside, I address this issue, as
                                                    well as compare how the procedures (from the previous sections) stack up
                                                    against one another here in this section.
                                                    Why don’t all the procedures
                                                    get the same results?
                                                    The forward and backward selection procedures’ overall goals and general
                                                    process are similar. In both the forward and backward selection procedures,
                                                    you’re trying to fit a good model to the data. In both procedures, you evalu-
                                                    ate each new model based on how it compares to the previous model that
                                                    you examined (which has only a one-variable difference). But because the
                                                    forward selection model starts at one end of the number of x variables spec-
                                                    trum and the backward selection model starts at the other end, the two pro-
                                                    cedures build their final models differently, one variable at a time. Therefore
                                                    these two models might meet in the middle and give the same model, but it is
                                                    certainly not the norm.
   142   143   144   145   146   147   148   149   150   151   152