Page 217 - Intermediate Statistics for Dummies
P. 217
18_045206 ch12.qxd 2/1/07 10:17 AM Page 196
196
Part III: Comparing Many Means with ANOVA
y-values by themselves and see how their variability plays a central role
in the regression model. This is the first step toward applying ANOVA
(the analysis of variance) to the regression model.
Verifying variability in the y’s
and looking at x to explain it
No matter what y variable you’re interested in predicting, you will always
have variability in those y-values. If you want to predict the length of a fish,
you may notice that fish have many different lengths (indicating a great deal
of variability). Even if you put all the fish of the same age and species together,
you still have some variability in their lengths (it will be less than before, but
still there nonetheless). The first step to understanding the basic ideas of
regression and ANOVA is to understand that variability in the y’s is to be
expected, and your job is to try to figure out what can explain most of it. This
section deals with seeing and explaining variability in the y-values.
Seeing the variability in Internet use
Both regression and ANOVA work to get a handle on explaining the variability
in the y variable using an x variable. After you collect your data, you can find
the standard deviation in the y variable to get a sense of how much the data
varies within the sample. From there, you collect data on an x variable and
see how much it contributes to explaining that variability.
Suppose you notice that people spend different amounts of time on the
Internet, and you want to explore why that may be. You start by taking a
small sample of 20 people and record how many hours per month they spend
on the Internet. The results (in hours) are 20, 20, 22, 39, 40, 19, 20, 32, 33, 29,
24, 26, 30, 46, 37, 26, 45, 15, 24, and 31. The first thing you notice about this
data is the large amount of variability in it. The standard deviation (average
distance from the data values to their mean) of this data set is 8.93, which is
quite large given the size of the numbers in the data set.
Finding an “x-planation” for Internet use
So you figure out that the y-values (such as amount of time someone uses the
Internet from the preceding section) have a great deal of variability in them.
What can help explain this? Part of the variability is due to chance. But you

