Page 134 - Intermediate Statistics for Dummies
P. 134
11_045206 ch06.qxd 2/1/07 9:52 AM Page 113
Chapter 6: One Step Forward and Two Steps Back: Regression Model Selection
Using the Forward Model
Selection Procedure
The first of the three model selection procedures I present in this chapter is
called forward selection. This process gives a systematic way of selecting a
good model to predict y. It starts out with no variables at all, and then adds
one variable, then another one, and then another one — each time including
the variable that contributes the highest amount toward estimating y, given
the other variables that are already in the model.
This section shows you how the forward selection procedure works for
selecting a final regression model, and what the philosophy is for doing so.
It also shows you how to assess the fit of the final model by using some new
criterion.
Adding variables — one at a time 113
The forward selection procedure starts with a model that contains no x vari-
ables and then adds x variables one at a time until the final model has been
reached.
Here’s how the forward selection procedure works in general, but before the
hair begins to stand up on the back of your neck, note that Minitab or any
other statistical software takes care of all the heavy lifting used for this and
all the other model selection procedures:
1. Choose a prespecified value of α for determining when to add a vari-
able to the model.
This α is called the entry level for a variable. Typically you want to
choose the value α = 0.05 or 0.10 as the entry level. The higher the α
level, the easier it is to add a variable to the model.
2. Start with the model containing no variables: y = b 0 .
You are left with just the constant b 0 term.
3. Go through each possible x variable that could be included in the
model and test each one’s coefficient to see whether it’s statistically
significant by using a t-test.
If the variable is statistically significant, it has a significant contribution
to determining y, given that the rest of the variables in the model are
fixed. Any variable that isn’t statistically significant is out of the running
to be added to the model at this point. (See Chapter 5 on conducting
t-tests for regression coefficients.)