Page 134 - Statistics for Environmental Engineers
P. 134

L1592_Frame_C15  Page 131  Tuesday, December 18, 2001  1:50 PM





                       The Winsorized Mean
                       Winsorization can be used to estimate the mean and standard deviation of a distribution although the
                       data set has a few missing or unreliable values at either or both ends of the distribution.

                       Example 15.3

                           Again using the data in Example 15.1, replace the four censored values by the next largest value,
                           which is 6.1. Replace the four largest values by the next smallest value, which is 8.9. The replaced
                           values are shown in italics. This gives the Winsorized sample (n = 27) below:

                                     6.1   6.1   6.1   6.1   6.1   6.3   6.5   6.7   6.9
                                     7.2   7.3   7.4   7.5   7.6   7.7   7.8   7.9   8.0
                                     8.1   8.3   8.5   8.7   8.9   8.9   8.9   8.9   8.9

                           Compute the sample mean  (y)   and standard deviation  (s)  of the resulting Winsorized sample
                           in the usual way:

                                                 y =  7.53 µg/L  s =  1.022 µg/L


                           The Winsorized mean  (y w )   is the mean of the Winsorized sample:


                                                      y w =  y =  7.53 µg/L


                           The Winsorized standard deviation  (s w )   is an approximately unbiased estimator of  s and is
                           computed from the standard deviation of the Winsorized sample as:

                                                   (
                                                  sn 1)    1.02 27 1)
                                                              (
                                                                 –
                                                     –
                                              s w =  ------------------- =  ------------------------------ =  1.48 µg/L
                                                   ν 1       19 1
                                                    –
                                                               –
                           where n is the total number of observations and ν is the number of observations not replaced
                           during Winsorization. ν = 27 – 4 – 4 = 19 because the four “less-than” values and the four largest
                           values have been replaced.
                       If the data are from a normal distribution, the upper and lower limits of a two-sided 100(1   α)%
                       confidence interval of the mean η are:


                                                         y w ±  t ν−1,α/2 -------
                                                                 s w
                                                                  n


                       where t ν−1,α/2  cuts 100(α/2)% off each tail of the t distribution with ν – 1 degrees of freedom. Note that
                       the degrees of freedom are ν – 1 instead of the usual n – 1, and s w  replaces s.
                        The trimmed mean and the Winsorized mean are best used on symmetric data sets that have either
                       missing (censored) or unreliable data on the tail of the distribution. If the distribution is symmetric, they
                       give unbiased estimates of the true mean.
                       © 2002 By CRC Press LLC
   129   130   131   132   133   134   135   136   137   138   139