Page 145 - Intermediate Statistics for Dummies
P. 145

11_045206 ch06.qxd  2/1/07  9:52 AM  Page 124
                               124
                                         Part II: Making Predictions by Using Regression
                                                        look for a model that has a small value of Mallow’s C-p compared to its
                                                                     2
                                                        competitors. R adjusted measures how much of the variability in the
                                                        y-values can be explained by the model, adjusted for the number of vari-
                                                                       2
                                                        ables included. (R adjusted ranges from 0 to 100 percent; see the section
                                                        “How well does the model fit?” earlier in this chapter.) If the model fits
                                                             2
                                                        well, R adjusted is high. So you also want to look for the smallest possi-
                                                        ble model that has a high value of R adjusted, and a small value of
                                                        Mallow’s C-p compared to its competitors. And if it comes down to two
                                                        similar models, you always want to make your final model as easy to
                                                        interpret as possible by selecting the model with the fewer variables.
                                                    To carry out the best subsets selection procedure in Minitab, go to Stat>
                                                    Regression>Best Subsets. Highlight the response variable (y), and click Select.
                                                    Highlight all the predictor (x) variables, and click Select. Click on OK.
                                                    Applying best subsets to the
                                                    punt distance example              2
                                                    Say that you analyzed the punt distance data by using the best subsets model
                                                    selection procedure. Your results are shown in Figure 6-5. This section fol-
                                                    lows Minitab’s footsteps in getting these results, and provides you with a
                                                    guide for interpreting the results.
                                                    Pouring over the output
                                                    Assuming that you already used Minitab to carry out the best subsets selec-
                                                    tion procedure on the punt distance data, you can now analyze the output
                                                    from Figure 6-5. Each variable shows up as a column on the right side of the
                                                    output. Each row represents the results from a model containing the number
                                                    of variables shown in column one. The X’s at the end of each row tell you
                                                    which variables were included in that model. The number of variables in the
                                                    model starts at one and increases to six because six x variables are available
                                                    in the data set.
                                                    The models with the same number of variables are ordered by their values of
                                                     2
                                                    R adjusted and Mallows C-p, from best to worst. The top-two models (for
                                                    each number of variables) are included in the computer output.
                                                    For example, rows one and two of Figure 6-5 (both marked 1 in the Vars
                                                    column) show the top-two models containing one x variable; rows three and
                                                    four show the top two models containing two x variables (and so on). Finally
                                                    the last row of Figure 6-5 shows the results of the full model containing all six
                                                    variables. (Only one model contains all six variables, so you don’t have a
                                                    second-best model in this case.)
   140   141   142   143   144   145   146   147   148   149   150