Page 128 - Statistics II for Dummies

P. 128

112
Part II: Using Different Types of Regression to Make Predictions
all the x variables and find the one with the largest p-value. If the p-value of
this x variable is higher than the removal level, that variable is taken out of
the model.

You continue removing variables from the model until the one with the larg-
est p-value doesn’t exceed the removal level. Then you stop.

The drawback of the backward selection procedure is that it starts with every-
thing and removes variables one at a time as you go along; after a variable is
removed, it never comes back. Again, the best model might not even be tested.

Using the best subsets procedure
The best subsets procedure has fewer steps than the forward or backward
selection model because the computer formulates and analyzes all possible
models in a single step. In this section, you see how to get the results and then
use them to come up with a best multiple regression model for predicting y.

Here are the steps for conducting the best subsets model selection proce-
dure to select a multiple regression model; note that Minitab does all the
work for you to crunch the numbers:

1. Conduct the best subsets procedure in Minitab, using all possible
subsets of the x variables being considered for inclusion in the final
model.
To carry out the best subsets selection procedure in Minitab, go to
Stat>Regression>Best Subsets. Highlight the response variable (y), and
click Select. Highlight all the predictor (x) variables, click Select, and
then click OK.
The output contains a listing of all models that contain one x variable,
all models that contain two x variables, all models that contain three x
variables, and so on, all the way up to the full model (containing all the x
variables). Each model is presented in one row of the output.
2. Choose the best of all the models shown in the best subsets Minitab
output by finding the model with the largest value of R adjusted and
2
the smallest value of Mallow’s C-p; if two competing models are about
equal, choose the model with the fewer number of variables.
2
If the model fits well, R adjusted is high. So you also want to look for the
2
smallest possible model that has a high value of R adjusted and a small
value of Mallow’s C-p compared to its competitors. And if it comes down
to two similar models, always make your final model as easy to interpret
as possible by selecting the model with fewer variables.

7/23/09 9:27:04 PM
11_466469-ch06.indd 112 7/23/09 9:27:04 PM
11_466469-ch06.indd 112

123 124 125 126 127 128 129 130 131 132 133