Page 233 - Statistics for Environmental Engineers
P. 233
L1592_frame_C26.fm Page 235 Tuesday, December 18, 2001 2:46 PM
of the n = 80 observations. This is also called the total adjusted sum of squares (corrected for the mean).
Each of the n observations provides one degree of freedom. One of them is consumed in computing the
grand average, leaving n − 1 degrees of freedom available to assign to each of the factors that contribute
variability. The Total SS and its n − 1 degrees of freedom are separated into contributions from the factors
controlled in the experimental design. For the dioxin/furan emissions experiment, these sums of squares
(SS) are:
Total SS = Periods SS + Samplers SS + Dioxin/Furan SS + Chlorination SS
+ Interaction(s) SS + Error SS
Another approach is to specify a general model to describe the data. It might be simple, such as:
y ijkl = y + α i + β j + γ k + λ l + ( interaction terms) + e i
where the Greek letters indicate the true response due to the four factors and e i is the random residual
error of the ith observation. The residual errors are assumed to be independent and normally distributed
2
with mean zero and constant variance σ (Rao, 1965; Box et al., 1978).
The assumptions of independence, normality, and constant variance are not equally important to the
ANOVA. Scheffe (1959) states, “In practice, the statistical inferences based on the above model are not
seriously invalidated by violation of the normality assumption, nor,…by violation of the assumption of
equality of cell variances. However, there is no such comforting consideration concerning violation of the
assumption of statistical independence, except for experiments in which randomization has been incor-
porated into the experimental procedure.”
If measurements had been replicated, it would be possible to make a direct estimate of the error sum
2
of squares (σ ). In the absence of replication, the usual practice is to use the higher-order interactions
2
as estimates of σ . This is justified by assuming, for example, that the fourth-order interaction has no
meaningful physical interpretation. It is also common that third-order interactions have no physical
significance. If sums of squares of third-order interactions are of the same magnitude as the fourth-order
2
interaction, they can be pooled to obtain an estimate of σ that has more degrees of freedom.
Because no one is likely to manually do the computations for a four-factor analysis of variance, we
assume that results are available from some commercial statistical software package. The analysis that
follows emphasizes variance decomposition and interpretation rather than model specification.
The first requirement for using available statistical software is recognizing whether the problem to be
solved is one-way ANOVA, two-way ANOVA, etc. This is determined by the number of factors that are
considered. In the example problem there are four factors: S, P, DF, and CL. It is therefore a four-way
ANOVA.
In practice, such a complex experiment would be designed in consultation with a statistician, in which
case the method of data analysis is determined by the experimental design. The investigator will have
no need to guess which method of analysis, or which computer program, will suit the data. As a corollary,
we also recommend that happenstance data (data from unplanned experiments) should not be subjected
to analysis of variance because, in such data sets, randomization will almost certainly have not been
incorporated.
Dioxin Case Study Results
The ANOVA calculations were done on the natural logarithm of the concentrations because this trans-
formation tended to strengthen the assumption of constant variance.
The results shown in Table 26.2 are the complete variance decomposition, specifying all sum of squares
(SS) and degrees of freedom (df) for the main effects of the four factors and all interactions between
the four factors. These are produced by any computer program capable of handling a four-way ANOVA
© 2002 By CRC Press LLC