Page 125 - Statistics II for Dummies
P. 125

Chapter 6: How Can I Miss You If You Won’t Leave? Regression Model Selection
                                  to distance, and lots of other variables related to hang time. Although hang   109
                                  time is clearly the most related to distance, the final multiple regression
                                  model may not include hang time.

                                  Here’s one possible scenario: You find a combination of other x variables
                                  that can do a good job estimating y together. And all those other variables
                                  are strongly related to hang time. This result may mean that in the end you
                                  don’t need to include hang time in the model. Strange things happen when
                                  you have many different x variables to choose from.

                                  After you narrow down the set of possible x variables for inclusion in the
                                  model to predict punt distance, the next step is to put those variables
                                  through a selection procedure to trim down the list to a set of essential vari-
                                  ables for predicting y.



                       Just Like Buying Shoes: The Model

                       Looks Nice, But Does It Fit?


                                  When you get into model selection procedures, you find that many differ-
                                  ent methods exist for selecting the best model, according to a wide range
                                  of criteria. Each one can result in models that differ from each other, but
                                  that’s something I love about statistics: Sometimes there’s no one single best
                                  answer.

                                  The three model selection procedures covered in this section are

                                   ✓ Best subsets procedure
                                   ✓ Forward selection
                                   ✓ Backward selection

                                  Of all the model selection procedures out there, the one that gets the most
                                  votes with statisticians is the best subsets procedure, which examines every
                                  single possible model and determines which one fits best, using certain
                                  criteria.

                                  In this section, you see different methods statisticatians use to assess and
                                  compare the fit of different models. You see how the best subsets proce-
                                  dure works for model selection in a step-by-step manner. Then I show you
                                  how to take all the information given to you and wade through it to make
                                  your way to the answer — the best-fitting model based on a subset of the
                                  available x variables. Finally, you see how this procedure is applied to find
                                  a model to predict punt distance.










                                                                                                       7/23/09   9:27:04 PM
           11_466469-ch06.indd   109                                                                   7/23/09   9:27:04 PM
           11_466469-ch06.indd   109
   120   121   122   123   124   125   126   127   128   129   130