Page 143 - Intermediate Statistics for Dummies
P. 143

11_045206 ch06.qxd  2/1/07  9:52 AM  Page 122
                               122
                                         Part II: Making Predictions by Using Regression
                                                    Removing one variable: The Step 2 column
                                                    Notice in the Step 2 column of Figure 6-4 that the left leg strength variable
                                                    no longer appears as a result (and it stays that way), because it has the high-
                                                    est p-value at Step 1 and that p-value is larger than the entry level of 0.10.
                                                    This is the work of the backward selection procedure. It operates in the only-
                                                    the-strong-survive mode when it comes to variable elimination.
                                                    In looking at the p-values for this new model in the Step 2 column, you see
                                                    the variable with the highest p-value is hang time (0.874). This result doesn’t
                                                    make sense at first because in Table 6-2 you saw hang time had the strongest
                                                    relationship with punt distance.
                                                    However, remember what the p-value represents here — the significance of
                                                    the variable in its contribution to y, given all the other variables already in
                                                    the model. Because so many of the other variables in the model were shown
                                                    to be correlated with hang time (see Figure 6-1), it makes sense that hang
                                                    time could possibly be eliminated somewhere near the beginning of this
                                                    procedure.
                                                    Working down to the final model: The Step 3 column and beyond
                                                    The Step 3 column of Figure 6-4 shows the model without left leg strength or
                                                    hang time. The next variable to be removed is left leg flexibility, which has a
                                                    p-value = 0.574. Looking at the Step 4 column of Figure 6-4, the next variable
                                                    to be removed is right leg flexibility, which has a p-value of 0.346.
                                                    After right leg flexibility is removed from the model, you can see the result in
                                                    Step 5 of Figure 6-4. All the remaining variables in the model have p-values
                                                    smaller than the level for removal, which is 0.10. This means you stop the
                                                    backward selection procedure and keep the model you’ve got. The final
                                                    model for the punt distance data using the backward selection procedure
                                                    with removal level 0.10 is y = 12.77 + 0.56 x 1 + 0.27x 2 , where x 1 = right leg
                                                                                                     2
                                                    strength and x 2 = overall leg strength. The final value of R adjusted is 74.14
                                                                                                        2
                                                    percent, which isn’t all that bad. (I’ve seen higher values of R , but I’ve also
                                                    seen a lot worse.) Mallow cheers this model on with a C-p value of 0, which
                                                    has been rounded off a bit.
                                                                                                   2
                                                                              2
                                                    Always remember to use the R adjusted rather than R to assess the fit of
                                                    your model at each step of any selection procedure, and here’s why: In the
                                                                                     2
                                                                                           2
                                                    punt distance example, the values of R and R adjusted appear on the second
                                                    and third lines from the bottom of the Minitab output in Figure 6-4. You can
                                                                                     2
                                                    see that with each step, the values of R decrease because fewer variables are
                                                    in the model to contribute something to predicting y. However, the values of
                                                     2
                                                    R adjusted increase because the adjustment needed for the number of vari-
                                                    ables in the model goes down. Each variable left in the model is providing
                                                    more bang for the buck in terms of helping predict y.
   138   139   140   141   142   143   144   145   146   147   148