Page 217 - Intermediate Statistics for Dummies
P. 217

18_045206 ch12.qxd  2/1/07  10:17 AM  Page 196
                               196
                                         Part III: Comparing Many Means with ANOVA
                                                    y-values by themselves and see how their variability plays a central role
                                                    in the regression model. This is the first step toward applying ANOVA
                                                    (the analysis of variance) to the regression model.
                                                    Verifying variability in the y’s
                                                    and looking at x to explain it
                                                    No matter what y variable you’re interested in predicting, you will always
                                                    have variability in those y-values. If you want to predict the length of a fish,
                                                    you may notice that fish have many different lengths (indicating a great deal
                                                    of variability). Even if you put all the fish of the same age and species together,
                                                    you still have some variability in their lengths (it will be less than before, but
                                                    still there nonetheless). The first step to understanding the basic ideas of
                                                    regression and ANOVA is to understand that variability in the y’s is to be
                                                    expected, and your job is to try to figure out what can explain most of it. This
                                                    section deals with seeing and explaining variability in the y-values.
                                                    Seeing the variability in Internet use
                                                    Both regression and ANOVA work to get a handle on explaining the variability
                                                    in the y variable using an x variable. After you collect your data, you can find
                                                    the standard deviation in the y variable to get a sense of how much the data
                                                    varies within the sample. From there, you collect data on an x variable and
                                                    see how much it contributes to explaining that variability.
                                                    Suppose you notice that people spend different amounts of time on the
                                                    Internet, and you want to explore why that may be. You start by taking a
                                                    small sample of 20 people and record how many hours per month they spend
                                                    on the Internet. The results (in hours) are 20, 20, 22, 39, 40, 19, 20, 32, 33, 29,
                                                    24, 26, 30, 46, 37, 26, 45, 15, 24, and 31. The first thing you notice about this
                                                    data is the large amount of variability in it. The standard deviation (average
                                                    distance from the data values to their mean) of this data set is 8.93, which is
                                                    quite large given the size of the numbers in the data set.
                                                    Finding an “x-planation” for Internet use
                                                    So you figure out that the y-values (such as amount of time someone uses the
                                                    Internet from the preceding section) have a great deal of variability in them.
                                                    What can help explain this? Part of the variability is due to chance. But you
   212   213   214   215   216   217   218   219   220   221   222