Page 134 - Intermediate Statistics for Dummies
P. 134

11_045206 ch06.qxd  2/1/07  9:52 AM  Page 113
                                          Chapter 6: One Step Forward and Two Steps Back: Regression Model Selection
                                         Using the Forward Model
                                         Selection Procedure
                                                    The first of the three model selection procedures I present in this chapter is
                                                    called forward selection. This process gives a systematic way of selecting a
                                                    good model to predict y. It starts out with no variables at all, and then adds
                                                    one variable, then another one, and then another one — each time including
                                                    the variable that contributes the highest amount toward estimating y, given
                                                    the other variables that are already in the model.
                                                    This section shows you how the forward selection procedure works for
                                                    selecting a final regression model, and what the philosophy is for doing so.
                                                    It also shows you how to assess the fit of the final model by using some new
                                                    criterion.
                                                    Adding variables — one at a time                                      113
                                                    The forward selection procedure starts with a model that contains no x vari-
                                                    ables and then adds x variables one at a time until the final model has been
                                                    reached.
                                                    Here’s how the forward selection procedure works in general, but before the
                                                    hair begins to stand up on the back of your neck, note that Minitab or any
                                                    other statistical software takes care of all the heavy lifting used for this and
                                                    all the other model selection procedures:
                                                     1. Choose a prespecified value of α for determining when to add a vari-
                                                        able to the model.
                                                        This α is called the entry level for a variable. Typically you want to
                                                        choose the value α = 0.05 or 0.10 as the entry level. The higher the α
                                                        level, the easier it is to add a variable to the model.
                                                     2. Start with the model containing no variables: y = b 0 .
                                                        You are left with just the constant b 0 term.
                                                     3. Go through each possible x variable that could be included in the
                                                        model and test each one’s coefficient to see whether it’s statistically
                                                        significant by using a t-test.
                                                        If the variable is statistically significant, it has a significant contribution
                                                        to determining y, given that the rest of the variables in the model are
                                                        fixed. Any variable that isn’t statistically significant is out of the running
                                                        to be added to the model at this point. (See Chapter 5 on conducting
                                                        t-tests for regression coefficients.)
   129   130   131   132   133   134   135   136   137   138   139