Page 282 - Statistics II for Dummies
P. 282

266        Part IV: Building Strong Connections with Chi-Square Tests



                                Compare what I observed in my sample (column two of Table 15-2) to what I
                                expected to get (column three of Table 15-2). Notice that I observed a lower
                                percentage of brown and red M&M’S than expected and a lower percentage
                                of blues than expected. I also observed a higher percentage of yellow, orange,
                                and green M&M’S than expected. Sample results vary by random chance,
                                from sample to sample, and the difference I observed may just be due to
                                this chance variation. But could the differences indicate that the expected
                                percentages reported by Mars aren’t being followed?

                                It stands to reason that if the differences between what you observed and
                                what you expected are small, you should attribute that difference to chance
                                and let the expected model stand. On the other hand, if the differences
                                between what you observed and what you expected are large enough, you
                                may have enough evidence to indicate that the expected model has some
                                problems. How do you know which conclusion to make? The operative
                                phrase is, “if the differences are large enough.” You need to quantify this
                                term large enough. Doing so takes a bit more machinery, which I cover in the
                                next section.



                                Calculating the goodness-of-fit statistic

                                The goodness-of-fit statistic is one number that puts together the total
                                amount of difference between what you expect in each cell compared to
                                the number you observe. The term cell is used to express each individual
                                category within a table format. For example, with the M&M’S example, the
                                first columns of Tables 15-1 and 15-2 contain six cells, one for each color of
                                M&M. For any cell, the number of items you observe in that cell is called the
                                observed cell count. The number of items you expect in that cell (under the
                                given model) is called the expected cell count. You get the expected cell count
                                by taking the expected cell percentage times the sample size.

                                The expected cell count is just a proportion of the total, so it doesn’t have to
                                be a whole number. For example, if you roll a fair die 200 times, you should
                                expect to roll ones  , or 16.67 percent, of the time. In terms of the number
                                of ones you expect, it should be 0.1667 * 200 = 33.33. Use the 33.33 in your
                                calculations for goodness-of-fit; don’t round to a whole number. Your final
                                answer is more accurate that way.
                                The reason the goodness-of-fit statistic is based on the number in each cell
                                rather than the percentage in each cell is because percents are a bit deceiving. If
                                you know that 8 out of 10 people support a certain view, that’s 80 percent. But
                                80 out of 100 is also 80 percent. Which one would you feel is a more-precise
                                statistic? The 80 out of 100 percent because it uses more information. Using
                                percents alone disregards the sample size. Using the counts (the number in
                                each group) keeps track of the amount of precision you have.









          22_466469-ch15.indd   266                                                                   7/24/09   9:52:20 AM
   277   278   279   280   281   282   283   284   285   286   287