Page 145 - Intermediate Statistics for Dummies
P. 145
11_045206 ch06.qxd 2/1/07 9:52 AM Page 124
124
Part II: Making Predictions by Using Regression
look for a model that has a small value of Mallow’s C-p compared to its
2
competitors. R adjusted measures how much of the variability in the
y-values can be explained by the model, adjusted for the number of vari-
2
ables included. (R adjusted ranges from 0 to 100 percent; see the section
“How well does the model fit?” earlier in this chapter.) If the model fits
2
well, R adjusted is high. So you also want to look for the smallest possi-
ble model that has a high value of R adjusted, and a small value of
Mallow’s C-p compared to its competitors. And if it comes down to two
similar models, you always want to make your final model as easy to
interpret as possible by selecting the model with the fewer variables.
To carry out the best subsets selection procedure in Minitab, go to Stat>
Regression>Best Subsets. Highlight the response variable (y), and click Select.
Highlight all the predictor (x) variables, and click Select. Click on OK.
Applying best subsets to the
punt distance example 2
Say that you analyzed the punt distance data by using the best subsets model
selection procedure. Your results are shown in Figure 6-5. This section fol-
lows Minitab’s footsteps in getting these results, and provides you with a
guide for interpreting the results.
Pouring over the output
Assuming that you already used Minitab to carry out the best subsets selec-
tion procedure on the punt distance data, you can now analyze the output
from Figure 6-5. Each variable shows up as a column on the right side of the
output. Each row represents the results from a model containing the number
of variables shown in column one. The X’s at the end of each row tell you
which variables were included in that model. The number of variables in the
model starts at one and increases to six because six x variables are available
in the data set.
The models with the same number of variables are ordered by their values of
2
R adjusted and Mallows C-p, from best to worst. The top-two models (for
each number of variables) are included in the computer output.
For example, rows one and two of Figure 6-5 (both marked 1 in the Vars
column) show the top-two models containing one x variable; rows three and
four show the top two models containing two x variables (and so on). Finally
the last row of Figure 6-5 shows the results of the full model containing all six
variables. (Only one model contains all six variables, so you don’t have a
second-best model in this case.)