Page 149 - Intermediate Statistics for Dummies
P. 149

11_045206 ch06.qxd  2/1/07  9:52 AM  Page 128
                               128
                                         Part II: Making Predictions by Using Regression
                                                    Because all three model selection procedures are available in Minitab, the
                                                    temptation may be to just run all three procedures, see what you get, and
                                                    choose the one you like the best. This approach wouldn’t be a good idea and
                                                    is called data fishing or data snooping, which can lead to conclusions that
                                                    others can’t confirm (for more on these no-no’s, flip to Chapter 1).
                                                    Examining the downsides
                                                    The forward and backward selection procedures are somewhat limiting in
                                                    the way they build their models. After hang time, for example, is eliminated
                                                    in the backward selection procedure (in Figure 6-4), it never appears again
                                                    in any later models. After hang time is added in the forward selection proce-
                                                    dure, it stays in every model from then on. The best subsets procedure (in
                                                    Figure 6-5), on the other hand, examines all possible models including those
                                                    containing hang time and those that don’t.
                                                    Standing out above the rest: The best subsets procedure
                                                    Because of its versatility and the comprehensive way it looks at all possible
                                                    models, the best subsets model is generally the model of choice by statisti-
                                                    cians. With six possible variables having two possibilities for each one (being
                                                    included or not being included in the model), you have 2  2  2  2  2  2 =
                                                                                                     * * * * *
                                                    64 possible models to look at in the best subsets procedure. Notice that this
                                                    set of all possible (64) models includes all the models shown in the step-by-
                                                    step process of forward and backward selection.
   144   145   146   147   148   149   150   151   152   153   154