Page 220 - Intermediate Statistics for Dummies
P. 220
18_045206 ch12.qxd 2/1/07 10:17 AM Page 199
Chapter 12: Rock My World: Relating Regression to ANOVA
Regression Analysis: Internet versus Education
The regression equation is
Internet = −8.29 + 3.15 Education
Coef
Predictor
T
P
SE Coef
0.002
−3.11
−8.290
2.665
Constant
Figure 12-1:
0.2387
3.1460
Education
0.000
13.18
Output for
simple
S = 7.23134
R—Sq = 41.2%
R—Sq(adj) = 41.0%
linear
regression
Analysis of Variance
applied to
P
Source
SS
MS
DF
F
education
9085.6
173.75
1
0.000
Regression
9085.6
and Internet
52.3
Residual Error 248
12968.5
use data.
22054.0
Total
249
Looking at Figure 12-1, you see that the p-value on the row marked Education 199
is 0.000, which means the p-value’s less than 0.001. Therefore the relationship
between years of education and Internet use is statistically significant. A scat-
terplot of the data (not shown here) also indicates that the data appear to
have a positive linear relationship. That means as you increase number of
years of education, Internet use also tends to increase (on average).
Assessing the fit of the regression model
Before you go ahead and use a regression model to make predictions for y
based on an x variable, you must first assess the fit of your model. One way
to get a rough idea of how well your regression model fits is by using a scatter-
plot (a graph showing all the pairs of data plotted in the x-y plane). Use the
scatterplot to see whether the data appears to fall in the pattern of a line. If
the data appears to follow a straight-line pattern (or even something close to
that — anything but a curve or a scattering of points that has no pattern at
all), you calculate the correlation, r, to see how strong the linear relationship
between x and y is (the closer r is to +1 or –1, the stronger the relationship;
the closer r is to zero, the weaker the relationship). Minitab can do scatter-
plots and correlations for you; see Chapter 4 for more on simple linear regres-
sion, including making a scatterplot and finding the value of r.
If the data doesn’t have a significant correlation, stop the analysis; you can’t
go further to find a line that fits a relationship that doesn’t exist.

