Page 149 - Intermediate Statistics for Dummies
P. 149
11_045206 ch06.qxd 2/1/07 9:52 AM Page 128
128
Part II: Making Predictions by Using Regression
Because all three model selection procedures are available in Minitab, the
temptation may be to just run all three procedures, see what you get, and
choose the one you like the best. This approach wouldn’t be a good idea and
is called data fishing or data snooping, which can lead to conclusions that
others can’t confirm (for more on these no-no’s, flip to Chapter 1).
Examining the downsides
The forward and backward selection procedures are somewhat limiting in
the way they build their models. After hang time, for example, is eliminated
in the backward selection procedure (in Figure 6-4), it never appears again
in any later models. After hang time is added in the forward selection proce-
dure, it stays in every model from then on. The best subsets procedure (in
Figure 6-5), on the other hand, examines all possible models including those
containing hang time and those that don’t.
Standing out above the rest: The best subsets procedure
Because of its versatility and the comprehensive way it looks at all possible
models, the best subsets model is generally the model of choice by statisti-
cians. With six possible variables having two possibilities for each one (being
included or not being included in the model), you have 2 2 2 2 2 2 =
* * * * *
64 possible models to look at in the best subsets procedure. Notice that this
set of all possible (64) models includes all the models shown in the step-by-
step process of forward and backward selection.