Page 135 - Intermediate Statistics for Dummies
P. 135
11_045206 ch06.qxd 2/1/07 9:52 AM Page 114
114
Part II: Making Predictions by Using Regression
4. Examine the p-values from each of the t-tests in step three (listed on
the Minitab output) and choose the smallest one.
The variable associated with that p-value is the best candidate to be
added to the model, because that variable is the most statistically signif-
icant of all the possible x variables at this point.
5. If the p-value for the x variable found in step four is smaller than the
prespecified α, add that x variable to the model.
After the first round, you have the model y = b 0 + b i x i where x i refers to
the first x variable you added to the model.
6. Repeat steps three through five, using the new model from step
five, and keep adding variables one at a time as long as the smallest
p-value of each round is less than the prespecified α = 0.05.
If the smallest p-value is larger than the prespecified α, don’t add any
more variables to the model and stop the forward selection process.
Your final model contains all of the x variables that were added during
each phase of the forward selection process.
To find a best multiple linear regression model by using the forward selection
procedure in Minitab, go to Stat>Regression>Stepwise. Highlight which vari-
able is the response (y) variable and click Select. This variable will show up
in the Response box. Then highlight which variables are the predictor (x)
variables and click Select. These variables will show up in the Predictor box.
Click on Methods, and click on Forward Selection. In the Alpha to Enter box,
put in your prespecified value of α you want to require to allow an x variable
to be included in the model. Typically statisticians would set this value at
between 0.05 and 0.10. (I use 0.05.) This prespecified α level is called the
entry level for the forward selection procedure. The higher the entry level,
the easier it is for a variable to be entered, but the greater chance that the
variable was just significant by random chance. (In the F-value box, the
default is 4.0, which should be fine. The F-value is beyond the scope of this
book in this context, although you do work with it when you do analysis of
variance — see Chapter 10. ) Click OK and you get the output from the for-
ward selection procedure.
You use a prespecified α level as the entry criteria for adding a variable
because it represents the chance of making a Type I error and inadvertently
putting in a variable based on your sample when it shouldn’t be included.
(See Chapter 3 for more on Type I errors.) You choose a small α level because
you don’t want to make it too easy to add a variable, because it increases the
chance of adding something that isn’t truly meaningful. (You have to put a lid
on it somehow!)