Page 170 - Statistics for Environmental Engineers
P. 170

L1592_Frame_C20  Page 169  Tuesday, December 18, 2001  1:53 PM

                       20




                       Multiple Paired Comparisons of k Averages






                       KEY WORDS data snooping, data dredging, Dunnett’s procedure, multiple comparisons, sliding refer-
                       ence distribution, studentized range, t-tests, Tukey’s procedure.

                       The problem of comparing several averages arises in many contexts: compare five bioassay treatments
                       against a control, compare four new polymers for sludge conditioning, or compare eight new combina-
                       tions of media for treating odorous ventilation air. One multiple paired comparison problem is to compare
                       all possible pairs of k treatments. Another is to compare k – 1 treatments with a control.
                        Knowing how to do a t-test may tempt us to compare several combinations of treatments using a
                       series of paired t-tests. If there are k treatments, the number of pair-wise comparisons that could be
                       made is k(k – 1)/2. For k = 4, there are 6 possible combinations, for k = 5 there are 10, for k = 10 there
                       are 45, and for k = 15 there are 105. Checking 5, 10, 45, or even 105 combinations is manageable but
                       not recommended. Statisticians call this data snooping (Sokal and Rohlf, 1969) or data dredging (Tukey,
                       1991). We need to understand why data snooping is dangerous.
                        Suppose, to take a not too extreme example, that we have 15 different treatments. The number of
                       possible pair-wise comparisons that could be made is 15(15  – 1)/2  = 105. If, before the results are
                       known, we make one selected comparison using a t-test with a 100α% = 5% error rate, there is a 5%
                       chance of reaching the wrong decision each time we repeat the data collection experiment for those two
                       treatments. If, however, several pairs of treatments are tested for possible differences using this procedure,
                       the error rate will be larger than the expected 5% rate. Imagine that a two-sample t-test is used to compare
                       the largest of the 15 average values against the smallest. The null hypothesis that this difference, the
                       largest of all the 105 possible pair-wise differences, is likely to be rejected almost every time the
                       experiment is repeated, instead of just at the 5% rate that would apply to making one pair-wise comparison
                       selected at random from among the 105 possible comparisons.
                        The number of comparisons does not have to be large for problems to arise. If there are just three
                       treatment methods and of the three averages,  A is larger than B and C is slightly larger than  A
                       ( y C >  y A >  y B ),  it is possible for the three possible t-tests to indicate that A gives higher results than B
                       (η A  > η B ), A is not different from C (η A  = η C ), and B is not different from C (η B  = η C ). This apparent
                       contradiction can happen because different variances are used to make the different comparisons. Analysis
                       of variance (Chapter 21) eliminates this problem by using a common variance to make a single test of
                       significance (using the F statistic).
                        The multiple comparison test is similar to a t-test but an allowance is made in the error rate to keep
                       the collective error rate at the stated level. This collective rate can be defined in two ways. Returning
                       to the example of 15 treatments and 105 possible pair-wise comparisons, the probability of getting the
                       wrong conclusion for a single randomly selected comparison is the individual error rate. The family
                       error rate (also called the  Bonferroni error rate) is the chance of getting one or more of the 105
                       comparisons wrong in each repetition of data collection for all 15 treatments. The family error rate
                       counts an error for each wrong comparison in each repetition of data collection for all 15 treatments.
                       Thus, to make valid statistical comparisons, the individual per comparison error rate must be shrunk to
                       keep the simultaneous family error rate at the desired level.






                       © 2002 By CRC Press LLC
   165   166   167   168   169   170   171   172   173   174   175