Page 282 - Statistics II for Dummies

P. 282

266 Part IV: Building Strong Connections with Chi-Square Tests

Compare what I observed in my sample (column two of Table 15-2) to what I
expected to get (column three of Table 15-2). Notice that I observed a lower
percentage of brown and red M&M’S than expected and a lower percentage
of blues than expected. I also observed a higher percentage of yellow, orange,
and green M&M’S than expected. Sample results vary by random chance,
from sample to sample, and the difference I observed may just be due to
this chance variation. But could the differences indicate that the expected
percentages reported by Mars aren’t being followed?

It stands to reason that if the differences between what you observed and
what you expected are small, you should attribute that difference to chance
and let the expected model stand. On the other hand, if the differences
between what you observed and what you expected are large enough, you
may have enough evidence to indicate that the expected model has some
problems. How do you know which conclusion to make? The operative
phrase is, “if the differences are large enough.” You need to quantify this
term large enough. Doing so takes a bit more machinery, which I cover in the
next section.

Calculating the goodness-of-fit statistic

The goodness-of-fit statistic is one number that puts together the total
amount of difference between what you expect in each cell compared to
the number you observe. The term cell is used to express each individual
category within a table format. For example, with the M&M’S example, the
first columns of Tables 15-1 and 15-2 contain six cells, one for each color of
M&M. For any cell, the number of items you observe in that cell is called the
observed cell count. The number of items you expect in that cell (under the
given model) is called the expected cell count. You get the expected cell count
by taking the expected cell percentage times the sample size.

The expected cell count is just a proportion of the total, so it doesn’t have to
be a whole number. For example, if you roll a fair die 200 times, you should
expect to roll ones , or 16.67 percent, of the time. In terms of the number
of ones you expect, it should be 0.1667 * 200 = 33.33. Use the 33.33 in your
calculations for goodness-of-fit; don’t round to a whole number. Your final
answer is more accurate that way.
The reason the goodness-of-fit statistic is based on the number in each cell
rather than the percentage in each cell is because percents are a bit deceiving. If
you know that 8 out of 10 people support a certain view, that’s 80 percent. But
80 out of 100 is also 80 percent. Which one would you feel is a more-precise
statistic? The 80 out of 100 percent because it uses more information. Using
percents alone disregards the sample size. Using the counts (the number in
each group) keeps track of the amount of precision you have.

22_466469-ch15.indd 266 7/24/09 9:52:20 AM

277 278 279 280 281 282 283 284 285 286 287