Page 163 - Applied Statistics Using SPSS, STATISTICA, MATLAB and R
P. 163

4.5 Inference on More than Two Populations   143


           4.5.2 One-Way ANOVA

           4.5.2.1 Test Procedure

           The one-way ANOVA test is applied when only one grouping variable is present in
           the dataset, i.e.,  one  has available  c independent samples, corresponding to  c
           categories (or  levels) of an effect and  wants to assess whether  or not the  null
           hypothesis should be rejected. As an  example, one  may have three independent
           samples of scores obtained by students in a certain course, corresponding to three
           different teaching methods, and want to assess whether or not the hypothesis of
           equality of student performance should be rejected. In this case, we have an effect
           – teaching method – with three categories.
              A basic assumption  for the variable  X being tested is that the  c independent
           samples are obtained from populations where X is normally distributed and with
           equal variance. Thus, the only possible difference among the populations refers to
           the means,  µ i. The equality of  variance tests were  already  described in section
           4.4.2. As to the normality assumption, if there are no “a priori” reasons to accept it,
           one can resort to goodness of fit tests described in the following chapter.
              In order to understand the ANOVA approach, we start by considering a single
           sample of size  n, subdivided in  c subsets of sizes  n 1,  n 2, …,  n c, with  averages
            x ,  x ,K  x ,  k  , and investigate how the total variance, v, can be expressed in terms
             1
                2
           of the subset  variances,  v i. Let any sample value be denoted  x ij, the first index
           referring to the subset, i = 1, 2, …, c, and the second index to the case number
           inside the subset, j = 1, 2, …, n i. The total variance is related to the total sum of
           squares, SST, of the deviations from the global sample mean,  x :
                    c  n i
                               2
              SST  =  ∑∑ x(  ij  − x) .                                    4.22
                    = i 1  = j 1
              Adding and subtracting x  to the deviations,  x − , we derive:
                                                      x
                                  i
                                                   ij
                    c  n i         c  n i         c  n i
              SST  = ∑∑ x(  ij  − x ) 2  + ∑∑ x(  i  − x) 2  − 2 ∑∑ x(  ij  − x )( x i  − x) .    4.23
                                                            i
                              i
                    = i 1  = j 1   = i 1  = j 1   = i 1  = j 1
              The last term can be proven to be zero. Let us now analyse the other two terms.
           The first term is called the within-group (or within-class) sum of squares, SSW,
           and represents the contribution to the total variance of the errors due to the random
           scattering of the cases around their group means. This also represents an error term
           due to the scattering of the cases, the so-called experimental error or error sum of
           squares, SSE.
              The second term is called the between-group (or between-class) sum of squares,
           SSB, and represents the contribution to the total variance of the deviations of the
           group means from the global mean.
              Thus:

              SST = SSW + SSB.                                             4.24
   158   159   160   161   162   163   164   165   166   167   168