Page 69 - Statistics for Environmental Engineers
P. 69
L1592_frame_C07.fm Page 61 Tuesday, December 18, 2001 1:44 PM
7
Using Transformations
KEY WORDS antilog, arcsin, bacterial counts, Box-Cox transformation, cadmium, confidence inter-
val, geometric mean, transformations, linearization, logarithm, nonconstant variance, plankton counts,
power function, reciprocal, square root, variance stabilization.
There is usually no scientific reason why we should insist on analyzing data in their original scale of
measurement. Instead of doing our analysis on y it may be more appropriate to look at log(y), y, 1/y,
or some other function of y. These re-expressions of y are called transformations. Properly used trans-
formations eliminate distortions and give each observation equal power to inform.
Making a transformation is not cheating. It is a common scientific practice for presenting and inter-
+
[
preting data. A pH meter reads in logarithmic units, pH = – log 10 H ] and not in hydrogen ion concen-
tration units. The instrument makes a data transformation that we accept as natural. Light absorbency
is measured on a logarithmic scale by a spectrophotometer and converted to a concentration with the
aid of a calibration curve. The calibration curve makes a transformation that is accepted without
hesitation. If we are dealing with bacterial counts, N, we think just as well in terms of log(N ) as N itself.
There are three technical reasons for sometimes doing the calculations on a transformed scale: (1) to
make the spread equal in different data sets (to make the variances uniform); (2) to make the distribution
1
of the residuals normal; and (3) to make the effects of treatments additive (Box et al., 1978). Equal
variance means having equal spread at the different settings of the independent variables or in the different
data sets that are compared. The requirement for a normal distribution applies to the measurement errors
and not to the entire sample of data. Transforming the data makes it possible to satisfy these requirements
when they are not satisfied by the original measurements.
Transformations for Linearization
Transformations are sometimes used to obtain a straight-line relationship between two variables. This
may involve, for example, using reciprocals, ratios, or logarithms. The left-hand panel of Figure 7.1 shows
the exponential growth of bacteria. Notice that the variance (spread) of the counts increases as the population
density increases. The right-hand panel shows that the data can be described by a straight line when plotted
on a log scale. Plotting on a log scale is equivalent to making a log transformation of the data.
The important characteristic of the original data is the nonconstant variance, not nonlinearity. This is
a problem when the curve or line is fitted to the data using regression. Regression tries to minimize the
distance between the data points and the line described by the model. Points that are far from the line
exert a strong effect because the regression mathematics wants to reduce the square of this distance. The result
is that the precisely measured points at time t = 1 will have less influence on the position of the regression
line than the poorly measured data at t = 3. This gives too much influence to the least reliable data. We
would prefer for each data point to have about the same amount of influence on the location of the line.
In this example, the log-transformed data have constant variance at the different population levels. Each data
1 a b
For example, if y = x z , a log transformation gives log y = a log x + b log z. Now the effects of factors x and z are additive.
See Box et al. (1978) for an example of how this can be useful.
© 2002 By CRC Press LLC