Page 140 - Intermediate Statistics for Dummies
P. 140
11_045206 ch06.qxd 2/1/07 9:52 AM Page 119
Chapter 6: One Step Forward and Two Steps Back: Regression Model Selection
variables and adding x variables one by one until you stop, you start with
all the x variables in the model and remove x variables one by one until you
stop. You may think that the forward selection procedure and the backward
selection procedure would give you the same final model, but in many cases
they don’t, which you can discover in the sections that follow.
Eliminating variables one by one
The backward selection procedure starts out with the full multiple regression
model containing all of the x variables (of which there are k of them.) The
starting model is y = b 0 + b 1 x 1 + . . . + b k x k . The object is to whittle down the
model so it includes the fewest number of variables needed to still fit well.
(Statisticians, as mysterious, mystical, and complicated as they may seem,
actually like their models to be as simple as possible!)
The computer does all the work for all model selection procedures, but you
have to set the criteria for when to allow a variable to be removed. You’re 119
also left standing with the output that needs to be interpreted. Don’t worry
though. It’s all a step-by-step process that you take one at a time. (Hopefully
those steps are forward and not backward, right? Right.)
In general, here’s how the backward selection procedure works (note that
Minitab does all the work for you on this procedure; all you have to do is
interpret the results and understand the process by which those results
were attained):
1. Choose a prespecified value of α for determining when to remove a
variable from the model.
In the backward selection procedure, you call α the removal level.
Typically you want to choose the removal level α = 0.10. The higher the
α level, the easier it is to remove a variable from the model. Statisticians
warn against using a removal level higher than the traditional value of 0.10
for fear of dropping variables out of the model too quickly, removing
important contributions that may be made by those variables. However,
if α is too small, the model could wind up being overly complex.
2. Start with the model containing all of the x variables: y = b o + b 1 x 1 +
b 2 x 2 + . . . + b k x k , where k is the total number of x variables.
Remember that this model is called the full model.
3. Conduct a t-test on the coefficient of each x variable to see whether
it’s statistically significant (see Chapter 5 for conducting t-tests on
coefficients of a multiple regression model), and note the p-value of
each t-test.