Page 223 - Intermediate Statistics for Dummies
P. 223

18_045206 ch12.qxd  2/1/07  10:18 AM  Page 202
                               202
                                         Part III: Comparing Many Means with ANOVA
                                                    Now that you have calculated SSTO and SSE, you need the bridge between
                                                    them. That is, you need a formula that connects the variability in the y i ’s
                                                    (SSTO) and the variability in the residuals after fitting the regression line
                                                                                                                /
                                                    (SSE). That bridge is SSR (equivalent to SST in ANOVA). In regression,  y rep-
                                                                                                                 i
                                                    resents the predicted value of y i based on the regression model. These are
                                                    the values on the regression line. To assess how much this regression line
                                                    helps to predict the y-values, you compare it to the model you would get
                                                    without any x variable in it.
                                                    Without any other information, the only thing you can do to predict y is look
                                                    at the average,  y. So, SST compares the predicted value from the regression
                                                    line to the predicted value from the flat line (the mean of the y’s) by subtract-
                                                                         /
                                                                       c
                                                                           y
                                                    ing them. The result is  y - m. Square each result and sum them all up, and
                                                                         i
                                                    you get the formula for SST.
                                                    Now for one last hoop to jump through (as if you haven’t had enough
                                                    already). Instead of calling the sum of squares for the regression model SST
                                                    as is done in ANOVA, statisticians call it SSR for sum of squares regression.
                                                    Consider SSR from regression to be equivalent to the SST from ANOVA.
                                                    The reason this is important is because computer output lists the sums of
                                                    squares for the regression model as SSR not SST.
                                                    To summarize the sums of squares as they apply to regression, you have
                                                    SSTO = SSR + SSE where
                                                       SSTO measures the variability in the observed y-values around their
                                                        mean. This value represents the variance of the y-values.
                                                       SSE represents the variability between the predicted values for y (the
                                                        values on the line) and the observed y-values. SSE represents the vari-
                                                        ability left over after the line has been fit to the data.
                                                       SSR measures the variability in the predicted values for y (the values on
                                                        the line) from the mean of y. SSR is the sum of squares due to the regres-
                                                        sion model (the line) itself.
                                                    Minitab calculates all the sums of squares for you as part of the regression
                                                    analysis. You can see this calculation in the section “Bringing regression to
                                                    the ANOVA table.”
                                                    Dividing up the degrees of freedom
                                                    In ANOVA, you test a model for the treatment (population) means by using an
                                                                    MST
                                                    F-test, which is F =  . To get MST (the mean sum of squares for treatment),
                                                                    MSE
                                                    you take SST (the sum of squares for treatment) and divide by its degrees of
   218   219   220   221   222   223   224   225   226   227   228