Page 144 - Intermediate Statistics for Dummies
P. 144

11_045206 ch06.qxd  2/1/07  9:52 AM  Page 123
                                          Chapter 6: One Step Forward and Two Steps Back: Regression Model Selection
                                         Using the Best Subsets Procedure
                                                    The best subsets procedure presents yet another way to find a best multiple
                                                    regression model. It basically examines the fit of every single possible model
                                                    that could be formulated from your x variables. You then use those model-
                                                    fitting results to make a decision about which model is the best one to use.
                                                    In this section, you see how the best subsets procedure works for model
                                                    selection in a step-by-step manner. Then you see how to take all the informa-
                                                    tion given to you and wade through it to make your way to the answer — the
                                                    best-fitting model based on a subset of the available x variables. Finally, you
                                                    see how this procedure is applied to find a model to predict punt distance.
                                                    Forming all models and
                                                    choosing the best one                                                 123
                                                    The best subsets procedure has fewer steps than the forward or backward
                                                    selection model because the computer formulates and analyzes all possible
                                                    models in a single step. In this section, you see how to get the results and then
                                                    use them to come up with a best multiple regression model for predicting y.
                                                    Here are the steps for conducting the best subsets model selection proce-
                                                    dure to select a multiple regression model (note that Minitab does all the
                                                    work for you to crunch the numbers):
                                                     1. Conduct the best subsets procedure in Minitab, using all possible
                                                        subsets of the x variables being considered for inclusion in the
                                                        final model (see the nearby Computer Output icon).
                                                        The output contains a listing of all models that contain one x variable,
                                                        all models that contain two x variables, all models that contain three
                                                        x variables, and so on, all the way up to the full model (containing all the
                                                        x variables). Each model is presented in one row of the output.
                                                     2. Choose the best of all the models shown in the best subsets Minitab
                                                                                                         2
                                                        output by finding the model with the largest value of R adjusted and
                                                        the smallest value of Mallow’s C-p; if two competing models are about
                                                        equal, choose the model with the fewer number of variables.
                                                        Mallow’s C-p is a measure of the amount of error in the predicted values
                                                        compared to the overall amount of variability in the data. If the model
                                                        fits well, the amount of error in the predicted values is small compared
                                                        to the overall variability in the data, and Mallow’s C-p will be small. So
   139   140   141   142   143   144   145   146   147   148   149