Page 125 - Statistics II for Dummies
P. 125
Chapter 6: How Can I Miss You If You Won’t Leave? Regression Model Selection
to distance, and lots of other variables related to hang time. Although hang 109
time is clearly the most related to distance, the final multiple regression
model may not include hang time.
Here’s one possible scenario: You find a combination of other x variables
that can do a good job estimating y together. And all those other variables
are strongly related to hang time. This result may mean that in the end you
don’t need to include hang time in the model. Strange things happen when
you have many different x variables to choose from.
After you narrow down the set of possible x variables for inclusion in the
model to predict punt distance, the next step is to put those variables
through a selection procedure to trim down the list to a set of essential vari-
ables for predicting y.
Just Like Buying Shoes: The Model
Looks Nice, But Does It Fit?
When you get into model selection procedures, you find that many differ-
ent methods exist for selecting the best model, according to a wide range
of criteria. Each one can result in models that differ from each other, but
that’s something I love about statistics: Sometimes there’s no one single best
answer.
The three model selection procedures covered in this section are
✓ Best subsets procedure
✓ Forward selection
✓ Backward selection
Of all the model selection procedures out there, the one that gets the most
votes with statisticians is the best subsets procedure, which examines every
single possible model and determines which one fits best, using certain
criteria.
In this section, you see different methods statisticatians use to assess and
compare the fit of different models. You see how the best subsets proce-
dure works for model selection in a step-by-step manner. Then I show you
how to take all the information given to you and wade through it to make
your way to the answer — the best-fitting model based on a subset of the
available x variables. Finally, you see how this procedure is applied to find
a model to predict punt distance.
7/23/09 9:27:04 PM
11_466469-ch06.indd 109 7/23/09 9:27:04 PM
11_466469-ch06.indd 109