Page 140 - Intermediate Statistics for Dummies
P. 140

11_045206 ch06.qxd  2/1/07  9:52 AM  Page 119
                                          Chapter 6: One Step Forward and Two Steps Back: Regression Model Selection
                                                    variables and adding x variables one by one until you stop, you start with
                                                    all the x variables in the model and remove x variables one by one until you
                                                    stop. You may think that the forward selection procedure and the backward
                                                    selection procedure would give you the same final model, but in many cases
                                                    they don’t, which you can discover in the sections that follow.
                                                    Eliminating variables one by one
                                                    The backward selection procedure starts out with the full multiple regression
                                                    model containing all of the x variables (of which there are k of them.) The
                                                    starting model is y = b 0 + b 1 x 1 + . . . + b k x k . The object is to whittle down the
                                                    model so it includes the fewest number of variables needed to still fit well.
                                                    (Statisticians, as mysterious, mystical, and complicated as they may seem,
                                                    actually like their models to be as simple as possible!)
                                                    The computer does all the work for all model selection procedures, but you
                                                    have to set the criteria for when to allow a variable to be removed. You’re  119
                                                    also left standing with the output that needs to be interpreted. Don’t worry
                                                    though. It’s all a step-by-step process that you take one at a time. (Hopefully
                                                    those steps are forward and not backward, right? Right.)
                                                    In general, here’s how the backward selection procedure works (note that
                                                    Minitab does all the work for you on this procedure; all you have to do is
                                                    interpret the results and understand the process by which those results
                                                    were attained):
                                                     1. Choose a prespecified value of α for determining when to remove a
                                                        variable from the model.
                                                        In the backward selection procedure, you call α the removal level.
                                                        Typically you want to choose the removal level α = 0.10. The higher the
                                                        α level, the easier it is to remove a variable from the model. Statisticians
                                                        warn against using a removal level higher than the traditional value of 0.10
                                                        for fear of dropping variables out of the model too quickly, removing
                                                        important contributions that may be made by those variables. However,
                                                        if α is too small, the model could wind up being overly complex.
                                                     2. Start with the model containing all of the x variables: y = b o + b 1 x 1 +
                                                        b 2 x 2 + . . . + b k x k , where k is the total number of x variables.
                                                        Remember that this model is called the full model.
                                                     3. Conduct a t-test on the coefficient of each x variable to see whether
                                                        it’s statistically significant (see Chapter 5 for conducting t-tests on
                                                        coefficients of a multiple regression model), and note the p-value of
                                                        each t-test.
   135   136   137   138   139   140   141   142   143   144   145