Page 148 - Intermediate Statistics for Dummies
P. 148
11_045206 ch06.qxd 2/1/07 9:52 AM Page 127
Chapter 6: One Step Forward and Two Steps Back: Regression Model Selection
In the punt distance example, you can see that in Figure 6-2 (forward selec-
tion) the computer includes hang time first because it makes the biggest con-
tribution toward estimating y. But in Figure 6-4 (backward selection), all the
variables are in the model from the get-go, and after the weakest variable (on
all counts) was eliminated (left foot flexibility), the remaining variables were
the ones strongly related to hang time (see Figure 6-1). That made hang time
a redundant variable, so it was removed.
The best subsets model takes a totally different approach from forward and
backward selection. It just looks at all possible models you could have and
chooses the best ones at each level (one, two, three variables, and so on).
This model selection procedure has no building process that goes on where
subsequent models depend on what was selected in previous steps. That
means the best subsets procedure can easily give different results than either
of the other two procedures simply because it has many more possible
models to choose from.
How do the procedures stack 127
up against each other?
So the big question is which model selection procedure is the best one? You
can’t find a straight answer to that. The debate over this issue goes on and
on among the various research groups that analyze their data by using model
selection procedures. All three procedures, for example, are available in
Minitab, so they are considered viable procedures. However, many statisti-
cians do prefer one model selection procedure over the others, which I reveal
to you later in this section along with the positives and negatives of each
procedure.
Looking at the positives
What is nice about each of these procedures is that they have some order to
them and they make sense. You don’t take a haphazard approach with any of
the procedures, and any two people choosing the same procedure for build-
ing the best model with the same data set would get the same answer, which
is reassuring. All three procedures also usually provide results that are rea-
sonable and final models that have interpretative value, and each has its
own plus side. The forward selection keeps the models as simple as possible;
backward selection helps you not miss any important variables; and the best
subsets model examines every possible model and makes straight compar-
isons between them.