Page 148 - Intermediate Statistics for Dummies
P. 148

11_045206 ch06.qxd  2/1/07  9:52 AM  Page 127
                                          Chapter 6: One Step Forward and Two Steps Back: Regression Model Selection
                                                    In the punt distance example, you can see that in Figure 6-2 (forward selec-
                                                    tion) the computer includes hang time first because it makes the biggest con-
                                                    tribution toward estimating y. But in Figure 6-4 (backward selection), all the
                                                    variables are in the model from the get-go, and after the weakest variable (on
                                                    all counts) was eliminated (left foot flexibility), the remaining variables were
                                                    the ones strongly related to hang time (see Figure 6-1). That made hang time
                                                    a redundant variable, so it was removed.
                                                    The best subsets model takes a totally different approach from forward and
                                                    backward selection. It just looks at all possible models you could have and
                                                    chooses the best ones at each level (one, two, three variables, and so on).
                                                    This model selection procedure has no building process that goes on where
                                                    subsequent models depend on what was selected in previous steps. That
                                                    means the best subsets procedure can easily give different results than either
                                                    of the other two procedures simply because it has many more possible
                                                    models to choose from.
                                                    How do the procedures stack                                           127
                                                    up against each other?
                                                    So the big question is which model selection procedure is the best one? You
                                                    can’t find a straight answer to that. The debate over this issue goes on and
                                                    on among the various research groups that analyze their data by using model
                                                    selection procedures. All three procedures, for example, are available in
                                                    Minitab, so they are considered viable procedures. However, many statisti-
                                                    cians do prefer one model selection procedure over the others, which I reveal
                                                    to you later in this section along with the positives and negatives of each
                                                    procedure.
                                                    Looking at the positives
                                                    What is nice about each of these procedures is that they have some order to
                                                    them and they make sense. You don’t take a haphazard approach with any of
                                                    the procedures, and any two people choosing the same procedure for build-
                                                    ing the best model with the same data set would get the same answer, which
                                                    is reassuring. All three procedures also usually provide results that are rea-
                                                    sonable and final models that have interpretative value, and each has its
                                                    own plus side. The forward selection keeps the models as simple as possible;
                                                    backward selection helps you not miss any important variables; and the best
                                                    subsets model examines every possible model and makes straight compar-
                                                    isons between them.
   143   144   145   146   147   148   149   150   151   152   153