Page 63 - Statistics for Environmental Engineers

P. 63

L1592_frame_C06 Page 55 Tuesday, December 18, 2001 1:43 PM

External Reference Distributions

KEY WORDS histogram, reference distribution, moving average, normal distribution, serial corre-
lation, t distribution.

When data are analyzed to decide whether conditions are as they should be, or whether the level of some
variable has changed, the fundamental strategy is to compare the current condition or level with an
appropriate reference distribution. The reference distribution shows how things should be, or how they
used to be. Sometimes an external reference distribution should be created, instead of simply using one
of the well-known and nicely tabulated statistical reference distributions, such as the normal or t distri-
bution. Most statistical methods that rely upon these distributions assume that the data are random,
normally distributed, and independent. Many sets of environmental data violate these requirements.
A specially constructed reference distribution will not be based on assumptions about properties of
the data that may not be true. It will be based on the data themselves, whatever their properties. If
serial correlation or nonnormality affects the data, it will be incorporated into the external reference
distribution.
Making the reference distribution is conceptually and mathematically simple. No particular knowledge
of statistics is needed, and the only mathematics used are counting and simple arithmetic. Despite this
simplicity, the concept is statistically elegant, and valid judgments about statistical signiﬁcance can be
made.

Constructing an External Reference Distribution

The ﬁrst 130 observations in Figure 6.1 show the natural background pH in a stream. Table 6.1 lists the
data. Suppose that a new efﬂuent has been discharged to the stream and someone suggests it is depressing
the stream pH. A survey to check this has provided ten additional consecutive measurements: 6.66, 6.63,
6.82, 6.84, 6.70, 6.74, 6.76, 6.81, 6.77, and 6.67. Their average is 6.74. We wish to judge whether this
group of observations differs from past observations. These ten values are plotted as open circles on the
right-hand side of Figure 6.1. They do not appear to be unusual, but a careful comparison should be
made with the historical data.
The obvious comparison is the 6.74 average of the ten new values with the 6.80 average of the previous
130 pH values. One reason not to do this is that the standard procedure for comparing two averages,
the t-test, is based on the data being independent of each other in time. Data that are a time series, like
these pH data, usually are not independent. Adjacent values are related to each other. The data are serially
correlated (autocorrelated) and the t-test is not valid unless something is done to account for this
correlation. To avoid making any assumption about the structure of the data, the average of 6.74 should
be compared with a reference distribution for averages of sets of ten consecutive observations.
Table 6.1 gives the 121 averages of ten consecutive observations that can be calculated from the
historical data. The ten-day moving averages are plotted in Figure 6.2. Figure 6.3 is a reference distri-
bution for these averages. Six of the 121 ten-day averages are as low as 6.74. About 95% of the ten-
day averages are larger than 6.74. Having only 5% of past ten-day averages at this level or lower indicates
that the river pH may have changed.

58 59 60 61 62 63 64 65 66 67 68