Page 191 - Intermediate Statistics for Dummies
P. 191

15_045206 ch09.qxd  2/1/07  10:14 AM  Page 170
                               170
                                         Part III: Comparing Many Means with ANOVA
                                                    Breaking down the variance
                                                    into sums of squares
                                                    The first step of the F-test is splitting up the variability in the y variable into
                                                    portions that define where the variability is coming from. The term analysis
                                                    of variance is a great description for exactly how you conduct a test of k
                                                    population means. With the overall goal of testing whether k population (or
                                                    treatment) means are equal, you take a random sample from each of the k
                                                    populations. You first put all the data together into one big group and mea-
                                                    sure how much total variability there is; this variability is called the sums of
                                                    squares total, or SSTO. If the data are really diverse, SSTO is large. If the data
                                                    are very similar, SSTO is small.
                                                    Now the total variability in the combined data set (SSTO) can be split into
                                                    two parts:
                                                       SST: The variability between the groups, known as the sums of squares
                                                        for treatment
                                                       SSE: The variability within the groups, known as the sum of squares
                                                        for error
                                                    This splitting up of the variability in your data results in one of the most
                                                    important equalities in ANOVA. That equality is SSTO = SST + SSE.
                                                                                                      2
                                                    The formula for SSTO is the numerator of the formula for s , the variance of a
                                                                                   2                         th
                                                                                 x
                                                    single data set, so SSTO=  ΣΣ`  x ij - j , where i and j represent the j value
                                                                        th
                                                    in the sample from the i population. SSTO represents the total squared dis-
                                                    tance between the data values and their overall mean. The formula for SST is
                                                                   2                                            th
                                                                 x
                                                    SST =  n Σ _  x i - i , where n i is the size of the sample coming from the i pop-
                                                          i
                                                    ulation. SST represents the total squared distance between the means from  2
                                                    each sample and the overall mean. The formula for SSE is SSE =  ΣΣ`  x ij -  x ij ,
                                                                                             th
                                                                 th
                                                    where x ij is the j value in the sample from the i population and x i is the
                                                                                     th
                                                    mean of the sample coming from the i population. This formula represents
                                                    the total squared distance between the values in each sample and their corre-
                                                    sponding sample means. Using algebra, you can show (with some serious
                                                    elbow grease) that SSTO = SST + SSE.
                                                    The Minitab output for the watermelon seed spitting contest for the four age
                                                    groups is shown in Figure 9-3. Under the Source column of the ANOVA table,
                                                    you see Factor listed in row one. The factor variable (as described by Minitab)
                                                    represents the treatment or population variable. In column three of the Factor
                                                    row, you see the SST, which is equal to 89.75. In the Error row (row two), you
                                                    locate the SSE in column three, which equals 56.80. In row three (Total), column
                                                    three, you see the SSTO, which is 146.55. Using the values of SST, SSE, and
                                                    SSTO from the Minitab output, you can verify that SST + SSE = SSTO.
   186   187   188   189   190   191   192   193   194   195   196