Page 136 - Intermediate Statistics for Dummies
P. 136
11_045206 ch06.qxd 2/1/07 9:52 AM Page 115
Chapter 6: One Step Forward and Two Steps Back: Regression Model Selection
How well does the model fit?
The details regarding the formulas used behind the model selection proce-
dures in this chapter are beyond the scope of this book. However, knowing
what the procedure is doing and how to interpret the results are what’s most
important. To assess the fit of any multiple regression model, you can use the
2
2
following three techniques: R , R adjusted, and Mallows’s C-p. You can find
all three on the bottom line of the Minitab output when you do any sort of
model selection procedure.
I describe these techniques in the following:
2
2
R : R is the percentage of the variability in the y values that’s explained
by the model. It falls between 0 percent and 100 percent (0 and 1.0).
Values closer to 0 mean the model doesn’t do a good job of explaining y.
Values closer to 1.0 mean the model does an excellent job. Typically, I
2
say that you can consider R values higher than 0.70 to be good.
2
2
2
R adjusted: R adjusted is the value of R , adjusted down for a higher 115
number of variables in the model (which makes it much more useful
2
2
than the regular value of R ). A high value of R adjusted means the
model you have is fitting the data very well. I typically find a value of
2
0.70 to be considered high for R adjusted.
Mallow’s C-p: Mallow’s C-p is another measure of how well a model fits.
It basically looks at how much error is left unexplained by a model with
k predictor (x) variables compared to the average error left over from
the full model (with all the x variables) and adjusts it for the number of
variables in the model. The smaller Mallow’s C-p is, the better. Because
when it comes to the amount of error in your model, less is more.
2
2
Always use R adjusted rather than the regular R to assess the fit of a multi-
ple regression model. With every addition of a new variable into a multiple
2
regression model, the value of R stays the same or increases; it will never go
down. That’s because a new variable will either help explain some of the vari-
2
ability in the y’s (thereby increasing R by definition), or it will do nothing
2
(leaving R exactly where it was before). So theoretically, you could just keep
adding more and more variables into the model just for the sake of getting a
2
2
larger value of R . Here’s why the R adjusted is important: It keeps you from
adding more and more variables by taking into account how many values are
2
in the model. This way, the value of R adjusted can actually decrease if the
added value of the additional variable is outweighed by the number of vari-
ables in the model. This gives you an idea of how much or how little added
value you get from a bigger model (bigger isn’t always better).