Page 43 - Intermediate Statistics for Dummies
P. 43

05_045206 ch01.qxd  2/1/07  9:41 AM  Page 22
                                22
                                         Part I: Data Analysis and Model-Building Basics
                                                    The trouble is that people often just report the sample statistics and give no
                                                    regard to the expected amount of change with a new sample. This disregard
                                                    leads to big mistakes in the conclusions (more on hypothesis testing in
                                                    Chapter 3).
                                                    Analysis of variance (ANOVA)
                                                    ANOVA is the acronym for analysis of variance. You use ANOVA in situations
                                                    where you want to compare the means of more than two populations. For
                                                    example, you want to compare the lifetime of four brands of tires, in number
                                                    of miles. You take a random sample of 50 tires from each group, for a total of
                                                    200 tires, and set up an experiment to compare the lifetime of each tire, and
                                                    record it. You have four means and four standard deviations now, one for
                                                    each data set. But you have different types of variability in your data, each
                                                    measured by using various sums of squares. (Remember from your intro stats
                                                    that the variance of a data set is the total of all the squared distances
                                                    between the data and the mean, all divided by n – 1.)
                                                    One of the types of variability in your data is called the variability between
                                                    treatments (also known as SST, the treatment sums of squares). SST mea-
                                                    sures the variation in the average lifetimes of each brand of tire, compared to
                                                    the overall average lifetime. If SST is large, you have a chance that there’s a
                                                    difference in lifetimes due to the treatment (in this case, the brand of tire).
                                                    Next, you have the variability within the treatments (also known as SSE, the
                                                    error sums of squares). SSE measures the overall average amount of variabil-
                                                    ity of the tire lifetimes within each particular brand (after all, not all tires are
                                                    created equal, even if they’re of the same brand). If SSE is large, you have so
                                                    much variability within the tire brands themselves, that it will be harder to
                                                    see any real difference between the brands, even if it actually exists.
                                                    And finally, you have the total overall variability in the data values if you just
                                                    put them all together into one big data set. This variability is known as SSTO,
                                                    the total sums of squares. ANOVA splits up the total variability (SSTO) into
                                                    the between-groups variability (SST) plus the within-groups variability (SSE).
                                                    Then, to test for differences in average lifetime for the four brands of tires, you
                                                    compare the mean sums of squares for treatments (MST) to the mean sums
                                                    of squares for error (MSE) in a ratio called the F-statistic. If this ratio is large,
                                                    then the variability between the brands is more than the variability within the
                                                    brands, giving evidence that not all the means are the same for the different
                                                    tire brands. If the F-statistic is small, that means not enough difference was
                                                    between the treatment means, compared to the general variability within the
                                                    treatments themselves. In this case, you can’t say that the means are different
                                                    for the groups. (I give you the full scoop on ANOVA in Chapters 9 and 10.)
   38   39   40   41   42   43   44   45   46   47   48