Page 63 - Statistics for Environmental Engineers
P. 63

L1592_frame_C06  Page 55  Tuesday, December 18, 2001  1:43 PM









                       6




                       External Reference Distributions






                       KEY WORDS  histogram, reference distribution, moving average, normal distribution, serial corre-
                       lation, t distribution.

                       When data are analyzed to decide whether conditions are as they should be, or whether the level of some
                       variable has changed, the fundamental strategy is to compare the current condition or level with an
                       appropriate reference distribution. The reference distribution shows how things should be, or how they
                       used to be. Sometimes an external reference distribution should be created, instead of simply using one
                       of the well-known and nicely tabulated statistical reference distributions, such as the normal or t distri-
                       bution. Most statistical methods that rely upon these distributions assume that the data are random,
                       normally distributed, and independent. Many sets of environmental data violate these requirements.
                        A specially constructed reference distribution will not be based on assumptions about properties of
                       the data that may not be true. It will be based on the data themselves, whatever their properties. If
                       serial correlation or nonnormality affects the data, it will be incorporated into the external reference
                       distribution.
                        Making the reference distribution is conceptually and mathematically simple. No particular knowledge
                       of statistics is needed, and the only mathematics used are counting and simple arithmetic. Despite this
                       simplicity, the concept is statistically elegant, and valid judgments about statistical significance can be
                       made.



                       Constructing an External Reference Distribution

                       The first 130 observations in Figure 6.1 show the natural background pH in a stream. Table 6.1 lists the
                       data. Suppose that a new effluent has been discharged to the stream and someone suggests it is depressing
                       the stream pH. A survey to check this has provided ten additional consecutive measurements: 6.66, 6.63,
                       6.82, 6.84, 6.70, 6.74, 6.76, 6.81, 6.77, and 6.67. Their average is 6.74. We wish to judge whether this
                       group of observations differs from past observations. These ten values are plotted as open circles on the
                       right-hand side of Figure 6.1. They do not appear to be unusual, but a careful comparison should be
                       made with the historical data.
                        The obvious comparison is the 6.74 average of the ten new values with the 6.80 average of the previous
                       130 pH values. One reason not to do this is that the standard procedure for comparing two averages,
                       the t-test, is based on the data being independent of each other in time. Data that are a time series, like
                       these pH data, usually are not independent. Adjacent values are related to each other. The data are serially
                       correlated (autocorrelated) and the  t-test is not  valid unless something is done to account for this
                       correlation. To avoid making any assumption about the structure of the data, the average of 6.74 should
                       be compared with a reference distribution for averages of sets of ten consecutive observations.
                        Table 6.1 gives the 121  averages of ten consecutive observations that can be calculated from the
                       historical data. The ten-day moving averages are plotted in Figure 6.2. Figure 6.3 is a reference distri-
                       bution for these averages. Six of the 121 ten-day averages are as low as 6.74. About 95% of the ten-
                       day averages are larger than 6.74. Having only 5% of past ten-day averages at this level or lower indicates
                       that the river pH may have changed.


                       © 2002 By CRC Press LLC
   58   59   60   61   62   63   64   65   66   67   68