Page 227 - Statistics II for Dummies
P. 227
Chapter 12: Regression and ANOVA: Surprise Relatives! 211
Assessing the fit of the regression model
Before you go ahead and use a regression model to make predictions for y
based on an x variable, you must first assess the fit of your model. You can
2
do this with a scatterplot and correlation or R .
Using a scatterplot and correlation
One way to get a rough idea of how well your regression model fits is by
using a scatterplot, which is a graph showing all the pairs of data plotted in
the x-y plane. Use the scatterplot to see whether the data appear to fall in the
pattern of a line. If the data appear to follow a straight-line pattern (or even
something close to that — anything but a curve or a scattering of points that
has no pattern at all), you calculate the correlation, r, to see how strong the
linear relationship between x and y is. The closer r is to +1 or –1, the stronger
the relationship; the closer r is to zero, the weaker the relationship. Minitab
can do scatterplots and correlations for you; see Chapter 4 for more on
simple linear regression, including making a scatterplot and finding the
value of r.
If the data don’t have a significant correlation and/or the scatterplot doesn’t
look linear, stop the analysis; you can’t go further to find a line that fits a rela-
tionship that doesn’t exist.
Using R 2
The more general way of assessing not only the fit of a simple linear regres-
2
sion model but many other models too is to use R , also known as the coef-
ficient of determination. (For example, you can use this method in multiple,
nonlinear, and logistic regression models in Chapters 5, 7, and 8, to name a
2
few.) In simple linear regression, the value of R (as indicated by Minitab and
statisticians as a capital R squared) is equal to the square of the Pearson cor-
relation coefficient, r (indicated by Minitab and statisticians by a small r). In
2
all other situations, R provides a more general measure of model fit. (Note
that r only measures the fit of a straight-line relationship between one x vari-
2
able and one y variable; see Chapter 4.) An even better statistic, R adjusted,
2
modifies R to account for the number of variables in the model. (For more
2
information on R and its use and interpretation, see Chapter 6.)
2
The value of R adjusted for the model of using education to estimate Internet
use (see Figure 12-1) is equal to 41 percent. This value reflects the percentage
of variability in Internet use that can be explained by a person’s years of edu-
cation. This number isn’t close to one, but note that r, the square root of 41
percent, is 0.64, which in the case of linear regression indicates a moderate
relationship.
18_466469-ch12.indd 211 7/24/09 9:45:29 AM

