Page 126 - Statistics II for Dummies
P. 126
110
Part II: Using Different Types of Regression to Make Predictions
Assessing the fit of multiple
regression models
For any model selection procedure, assessing the fit of each model being
considered is built into the process. In other words, as you go through all the
possible models, you’re always keeping an eye on how well each model fits.
So before you get into a discussion of how to do the best subsets procedure,
you need criteria to assess how well a particular model fits a data set.
Although there are tons of different statistics for assessing the fit of regres-
sion models, I discuss the most popular ones: R (simple linear regression
2
2
only), R adjusted, and Mallow’s C-p. All three appear on the bottom line
of the Minitab output when you do any sort of model selection procedure.
Here’s a breakdown of the assessment techniques:
2
✓ R : R is the percentage of the variability in the y values that’s explained
2
by the model. It falls between 0 and 100 percent (0 and 1.0). In simple
linear regression (see Chapter 4), a high value of R means the line fits
2
2
well, and a low value of R means the line doesn’t fit well.
When you have multiple regression, however, there’s a bit of a catch
here. As you add more and more variables (no matter how significant),
2
the value of R increases or stays the same — it never goes down. This
can result in an inflated measure of how well the model fits. Of course,
statisticians have a fix for the problem, which leads me to the next item
on this list.
2
2
✓ R adjusted: R adjusted takes the value of R and adjusts it downward
2
according to the number of variables in the model. The higher the
number of variables in the model, the lower the value of R adjusted will
2
be, compared to the original R .
2
2
A high value of R adjusted means the model you have is fitting the data
very well (the closer to 1, the better). I typically find a value of 0.70 to be
2
considered okay for R adjusted, and the higher the better.
2
Always use R adjusted rather than the regular R to assess the fit of a
2
multiple regression model. With every addition of a new variable into a
2
multiple regression model, the value of R stays the same or increases. It
will never go down because a new variable will either help explain some
of the variability in the y’s (thereby increasing R by definition), or it will
2
do nothing (leaving R exactly where it was before). So theoretically, you
2
could just keep adding more and more variables into the model just for
2
the sake of getting a larger value of R .
2
R adjusted is important because it keeps you from adding more and
more variables by taking into account how many variables there already
2
are in the model. The value of R adjusted can actually decrease if the
7/23/09 9:27:04 PM
11_466469-ch06.indd 110 7/23/09 9:27:04 PM
11_466469-ch06.indd 110