Page 43 - Intermediate Statistics for Dummies
P. 43
05_045206 ch01.qxd 2/1/07 9:41 AM Page 22
22
Part I: Data Analysis and Model-Building Basics
The trouble is that people often just report the sample statistics and give no
regard to the expected amount of change with a new sample. This disregard
leads to big mistakes in the conclusions (more on hypothesis testing in
Chapter 3).
Analysis of variance (ANOVA)
ANOVA is the acronym for analysis of variance. You use ANOVA in situations
where you want to compare the means of more than two populations. For
example, you want to compare the lifetime of four brands of tires, in number
of miles. You take a random sample of 50 tires from each group, for a total of
200 tires, and set up an experiment to compare the lifetime of each tire, and
record it. You have four means and four standard deviations now, one for
each data set. But you have different types of variability in your data, each
measured by using various sums of squares. (Remember from your intro stats
that the variance of a data set is the total of all the squared distances
between the data and the mean, all divided by n – 1.)
One of the types of variability in your data is called the variability between
treatments (also known as SST, the treatment sums of squares). SST mea-
sures the variation in the average lifetimes of each brand of tire, compared to
the overall average lifetime. If SST is large, you have a chance that there’s a
difference in lifetimes due to the treatment (in this case, the brand of tire).
Next, you have the variability within the treatments (also known as SSE, the
error sums of squares). SSE measures the overall average amount of variabil-
ity of the tire lifetimes within each particular brand (after all, not all tires are
created equal, even if they’re of the same brand). If SSE is large, you have so
much variability within the tire brands themselves, that it will be harder to
see any real difference between the brands, even if it actually exists.
And finally, you have the total overall variability in the data values if you just
put them all together into one big data set. This variability is known as SSTO,
the total sums of squares. ANOVA splits up the total variability (SSTO) into
the between-groups variability (SST) plus the within-groups variability (SSE).
Then, to test for differences in average lifetime for the four brands of tires, you
compare the mean sums of squares for treatments (MST) to the mean sums
of squares for error (MSE) in a ratio called the F-statistic. If this ratio is large,
then the variability between the brands is more than the variability within the
brands, giving evidence that not all the means are the same for the different
tire brands. If the F-statistic is small, that means not enough difference was
between the treatment means, compared to the general variability within the
treatments themselves. In this case, you can’t say that the means are different
for the groups. (I give you the full scoop on ANOVA in Chapters 9 and 10.)