Page 75 - Statistics for Environmental Engineers
P. 75

L1592_frame_C07.fm  Page 67  Tuesday, December 18, 2001  1:44 PM











                       The Box-Cox Power Transformations
                       A power transformation model developed by Box and Cox (1964) can, so far as possible, satisfy the
                       conditions of normality and constant variance simultaneously. The method is applicable for almost any
                                                                                     λ ()
                       kind of statistical model and any kind of transformation. The transformed value Y i   of the original vari-
                       able y i  is:
                                                                λ
                                                           λ ()  =  y i –  1
                                                         Y i   --------------
                                                                  –
                                                                 λ 1
                                                               λy g
                       where y g   is the geometric mean of the original data series, and λ expresses the power of the transfor-
                       mation. The geometric mean is obtained by averaging ln(y) and taking the exponential (antilog) of the
                                                                         0 ()  =
                       result. The special case when λ = 0 is the log transformation:  Y i  y ln ().  λ = −1 is a reciprocal
                                                                              g
                                                                                 y i
                       transformation,  λ  =  1/2 is a square root transformation, and  λ  = 1 is no transformation. Example
                       applications of this transformation are given in Box et al. (1978).
                       Example 7.5

                           Table 7.5 lists 36 measurements on cadmium (Cd) in soil, and their logarithms. The Cd concen-
                           trations range from 0.005 to 0.094 mg/kg. The limit of detection was 0.01. Values below this
                           were arbitrarily replaced with 0.005. Comparisons must be made with other sets of similar data
                           and some transformation is needed before this can be done. Experience with environmental data
                           suggests that a log transformation may be useful, but something better might be discovered if
                           we make the Box-Cox transformation for several values of λ and compare the variances of the
                           transformed data.
                                                                           2
                             The variance of the log-transformed values in  Table 7.5 is  σ ln  y () =  0.549.    This cannot be
                           compared directly with the variance from, for instance, a square root transformation unless the
                           calculations are normalized to keep them on the same relative scale. The denominator of the Box-
                                             –
                                            g ,
                           Cox transformation, λy  λ 1   is a normalizing factor to make the variances comparable across different
                                 TABLE 7.5
                                 Cadmium Concentrations in Soil
                                          Cadmium                           ln
                                           (mg/kg)                       [Cadmium]
                                 0.023    0.005     0.005     −3.7723     −5.2983     −5.2983
                                 0.020    0.005     0.032     −3.9120     −5.2983     −3.4420
                                 0.010    0.005     0.031     −4.6052     −5.2983     −3.4738
                                 0.020    0.013     0.005     −3.9120     −4.3428     −4.2687
                                 0.020    0.005     0.014     −3.9120     −5.2983     −3.9120
                                 0.020    0.094     0.020     −3.9120     −2.3645     −5.2983
                                 0.010    0.011     0.005     −4.6052     −4.5099     −3.6119
                                 0.010    0.005     0.027     −4.6052     −5.2983     −4.1997
                                 0.010    0.005     0.015     −4.6052     −5.2983     −3.3814
                                 0.010    0.028     0.034     −4.6052     −3.5756     −5.2983
                                 0.010    0.010     0.005     −4.6052     −4.6052     −5.2983
                                 0.005    0.018     0.013     −5.2983     −4.0174     −4.3428
                                 Average = 0.0161             Average of ln[Cd] = −4.42723
                                 Variance = 0.000255          Variance = 0.549
                                                              Geo. mean = 0.01195
                                 Note: Concentrations in mg/kg.


                       © 2002 By CRC Press LLC
   70   71   72   73   74   75   76   77   78   79   80