Page 224 - Statistics II for Dummies
P. 224

208        Part III: Analyzing Variance with ANOVA



                      Seeing Regression through

                      the Eyes of Variation


                                Every basic statistical model tries to explain why the different outcomes (y)
                                are what they are. It tries to figure out what factors or explanatory variables
                                (x) can help explain that variability in those y’s. In this section, you start with
                                the y-values by themselves and see how their variability plays a central role
                                in the regression model. This is the first step toward applying ANOVA to the
                                regression model.

                                No matter what y variable you’re interested in predicting, you’ll always have
                                variability in those y-values. If you want to predict the length of a fish, for
                                example, you know that fish have many different lengths (indicating a great
                                deal of variability). Even if you put all the fish of the same age and species
                                together, you still have some variability in their lengths (it’s less than before
                                but still there nonetheless). The first step in understanding the basic ideas of
                                regression and ANOVA is to understand that variability in the y’s is to be
                                expected, and your job is to try to figure out what can explain most of it.


                                Spotting variability and finding
                                an “x-planation”


                                Both regression and ANOVA work to get a handle on explaining the variabil-
                                ity in the y variable using an x variable. After you collect your data, you can
                                find the standard deviation in the y variable to get a sense of how much the
                                data varies within the sample. From there, you collect data on an x variable
                                and see how much it contributes to explaining that variability.
                                Suppose you notice that people spend different amounts of time on the
                                Internet, and you want to explore why that may be. You start by taking a
                                small sample of 20 people and record how many hours per month they spend
                                on the Internet. The results (in hours) are 20, 20, 22, 39, 40, 19, 20, 32, 33, 29,
                                24, 26, 30, 46, 37, 26, 45, 15, 24, and 31. The first thing you notice about this
                                data is the large amount of variability in it. The standard deviation (average
                                distance from the data values to their mean) of this data set is 8.93 hours,
                                which is quite large given the size of the numbers in the data set.

                                So you figured out that the y-values — the amount of time someone uses the
                                Internet — have a great deal of variability in them. What can help explain
                                this? Part of the variability is due to chance. But you suspect some variable
                                is out there (call it x) that has some connection to the y variable, and that x
                                variable can help you make more sense out of this seemingly wide range of
                                y-values.








          18_466469-ch12.indd   208                                                                   7/24/09   9:45:28 AM
   219   220   221   222   223   224   225   226   227   228   229