Page 67 - Statistics for Environmental Engineers
P. 67
L1592_frame_C06 Page 59 Tuesday, December 18, 2001 1:43 PM
Setting Critical Levels
The reference distribution shows at a glance which values are exceptionally high or low. What is meant
by “exceptional” can be specified by setting critical decision levels that have a specified probability
value. For example, one might specify exceptional as the level that is exceeded p percent of the time.
The reference distribution for daily observations during stable operation (bottom panel in Figure 6.5)
is based on 1150 daily values representing stable performance. The critical upper 5% level cut is a BOD
concentration of 33 mg/L. This is found by summing the frequencies, starting from the highest BOD
observed during stable operation, until the accumulated percentage equals or exceeds 5%. In this case, the
probability that the BOD is 20 is P(BOD = 20) = 0.8%. Also, P(BOD = 19) = 0.8%, P(BOD = 18) = 1.6%,
and P(BOD = 17) = 1.6%. The sum of these percentages is 4.8%. So, as a practical matter, we can say
that the BOD exceeds 16 mg/L only about 5% of the time when operation is stable.
Upper critical levels can be set for the MA(7) reference distribution as well. The probability that a
7-day MA(7) of 14 mg/L or higher will occur when the treatment plant is stable is 4%. An MA(7)
greater than 13 mg/L serves warning that the process is performing poorly and may be upset. By definition,
5% of such warnings will be false alarms. A two-level warning system could be devised, for example,
by using the upper 1% and the upper 5% levels. The upper 1% level, which is about 16 mg/L, is a signal
that something is almost certainly wrong; it will be a false in only 1 out of 100 alerts.
There is a balance to be found between having occasional false alarms and no false alarms. Setting
a warning at the 5% level, or perhaps even at the 10% level, means that an operator is occasionally sent
to look for a problem when none exists. But it also means that many times a warning is given before a
problem becomes too serious and on some of these occasions action will prevent a minor upset from
becoming more serious. An occasional wild goose chase is the price paid for the early warnings.
Comments
Consider why the warning levels were determined empirically instead of by calculating the mean and
standard deviation and then using the normal distribution. People who know some statistics tend to think
of the bell-shaped, symmetrical normal distribution when they hear that “the mean is X and the standard
deviation is Y.” The words “mean” and “standard deviation” create an image of approximately 95% of
the values falling within two standard deviations of the mean.
A glance at Figure 6.6 reveals why this is an inappropriate image for the reference distribution of
moving averages. The distributions are not symmetrical and, furthermore, they are truncated. These
characteristics are especially evident in the MA(30) distribution. By definition, the effluent BOD values
are never very high when operation is stable, so MA cannot take on certain high values. Low values of
the MA do not occur because the effluent BOD cannot be less than zero and values less than 2 mg/L
were not observed. The normal distribution, with its finite probability of values occurring far out on the
tails of the distribution (and even into negative values), would be a terrible approximation of the reference
distribution derived from the operating record.
The reference distribution for the daily values will always give a warning before the MA does. The
MA is conservative. It flattens one-day upsets, even fairly large ones, and rolls smoothly through short
intervals of minor disturbances without giving much notice. The moving average is like a shock absorber
on a car in that it smooths out the small bumps. Also, just as a shock absorber needs to have the right
stiffness, a moving average needs to have the right length of memory to do its job well. A 30-day MA is
an interesting statistic to plot only because effluent standards use a 30-day average, but it is too sluggish
to usefully warn of trouble. At best, it can confirm that trouble has existed. The seven-day average is more
responsive to change and serves as a better warning signal. Exponentially weighted moving averages (see
Chapter 4) are also responsive and reference distributions can be constructed for them as well.
Just as there is no reason to judge process performance on the basis of only one variable, there is no
reason to select and use only one reference distribution for any particular single variable. One statistic
and its reference distribution might be most useful for process control while another is best for judging
© 2002 By CRC Press LLC