Page 127 - Statistics II for Dummies
P. 127

Chapter 6: How Can I Miss You If You Won’t Leave? Regression Model Selection
                                      added value of the additional variable is outweighed by the number of   111
                                      variables in the model. This gives you an idea of how much or how little
                                      added value you get from a bigger model (bigger isn’t always better).
                                   ✓ Mallow’s C-p: Mallow’s C-p takes the amount of error left unexplained
                                      by a model of p with the x variables, divides that number by the average
                                      amount of error left over from the full model (with all the x variables),
                                      and adjusts that result for the number of observations (n) and the
                                      number of x variables used (p). In general, the smaller Mallow’s C-p is,
                                      the better, because when it comes to the amount of error in your model,
                                      less is more. A C-p value close to p (the number of x variables in the
                                      model) reflects a model that fits well.


                                  Model selection procedures


                                  The process of finding the “best” model is not cut and dry. (Heck, even the
                                  definition of “best” here isn’t cut and dry.) Many different procedures exist
                                  for going through different models in a systematic way, evaluating each one,
                                  and stopping at the right model. Three of the more common model selection
                                  procedures are forward selection, backward selection, and the best subsets
                                  model. In this section you get a very brief overview of the forward and back-
                                  ward selection procedures, and then you get into the details of the best sub-
                                  sets model, which is the one statisticians use most.

                                  Going with the forward selection procedure
                                  The forward selection procedure starts with a model with no variables in it
                                  and adds variables one at a time according to the amount of contribution
                                  they can make to the model.

                                  Start with an entry level value of α. Then run hypothesis tests (see Chapter
                                  3 for instructions) for each x variable to see how it’s related to y. The x vari-
                                  able with the smallest p-value wins and is added to the model, as long as its
                                  p-value is smaller than the entry level. You keep doing this with the remain-
                                  ing variables until the one with the smallest p-value doesn’t make the entry
                                  level. Then you stop.

                                  The drawback of the forward selection procedure is that it starts with nothing
                                  and adds variables one at a time as you go along; after a variable is added, it’s
                                  never removed. The best model might not even get tested.

                                  Opting for the backward selection procedure
                                  The backward selection procedure does the opposite of the forward selection
                                  method. It starts with a model with all the x variables in it and removes vari-
                                  ables one at a time. Those that make the least amount of contribution to the
                                  model are removed first. You choose a removal level to begin; then you test









                                                                                                       7/23/09   9:27:04 PM
           11_466469-ch06.indd   111                                                                   7/23/09   9:27:04 PM
           11_466469-ch06.indd   111
   122   123   124   125   126   127   128   129   130   131   132