Page 144 - Intermediate Statistics for Dummies
P. 144
11_045206 ch06.qxd 2/1/07 9:52 AM Page 123
Chapter 6: One Step Forward and Two Steps Back: Regression Model Selection
Using the Best Subsets Procedure
The best subsets procedure presents yet another way to find a best multiple
regression model. It basically examines the fit of every single possible model
that could be formulated from your x variables. You then use those model-
fitting results to make a decision about which model is the best one to use.
In this section, you see how the best subsets procedure works for model
selection in a step-by-step manner. Then you see how to take all the informa-
tion given to you and wade through it to make your way to the answer — the
best-fitting model based on a subset of the available x variables. Finally, you
see how this procedure is applied to find a model to predict punt distance.
Forming all models and
choosing the best one 123
The best subsets procedure has fewer steps than the forward or backward
selection model because the computer formulates and analyzes all possible
models in a single step. In this section, you see how to get the results and then
use them to come up with a best multiple regression model for predicting y.
Here are the steps for conducting the best subsets model selection proce-
dure to select a multiple regression model (note that Minitab does all the
work for you to crunch the numbers):
1. Conduct the best subsets procedure in Minitab, using all possible
subsets of the x variables being considered for inclusion in the
final model (see the nearby Computer Output icon).
The output contains a listing of all models that contain one x variable,
all models that contain two x variables, all models that contain three
x variables, and so on, all the way up to the full model (containing all the
x variables). Each model is presented in one row of the output.
2. Choose the best of all the models shown in the best subsets Minitab
2
output by finding the model with the largest value of R adjusted and
the smallest value of Mallow’s C-p; if two competing models are about
equal, choose the model with the fewer number of variables.
Mallow’s C-p is a measure of the amount of error in the predicted values
compared to the overall amount of variability in the data. If the model
fits well, the amount of error in the predicted values is small compared
to the overall variability in the data, and Mallow’s C-p will be small. So