Page 49 - Statistics for Environmental Engineers
P. 49
L1592_Frame_C04 Page 41 Tuesday, December 18, 2001 1:41 PM
4
Smoothing Data
KEY WORDS moving average, exponentially weighted moving average, weighting factors, smooth-
ing, and median smoothing.
Smoothing is drawing a smooth curve through data in order to eliminate the roughness (scatter) that blurs
the fundamental underlying pattern. It sharpens our focus by unhooking our eye from the irregularities.
Smoothing can be thought of as a decomposition of the data. In curve fitting, this decomposition has
the general relation: data = fit + residuals. In smoothing, the analogous expression is: data = smooth +
rough. Because the smooth is intended to be smooth (as the “fit” is smooth in curve fitting), we usually
show its points connected. Similarly, we show the rough (or residuals) as separated points, if we show
them at all. We may choose to show only those rough (residual) points that stand out markedly from
the smooth (Tukey, 1977).
We will discuss several methods of smoothing to produce graphs that are especially useful with time
series data from treatment plants and complicated environmental systems. The methods are well estab-
lished and have a long history of successful use in industry and econometrics. The methods are effective
and economical in terms of time and money. They are simple; they are useful to everyone, regardless
of statistical expertise. Only elementary arithmetic is needed. A computer may be helpful, but is not
needed, especially if one keeps the plot up-to-date by adding points daily or weekly as they become
available.
In statistics and quality control literature, one finds mathematics and theory that can embellish these
graphs. A formal statistical analysis, such as adding control limits, can become quite complex because
often the assumptions on which such tests are usually based are violated rather badly by environmental
data. These embellishments are discussed in another chapter.
Smoothing Methods
One method of smoothing would be to fit a straight line or polynomial curve to the data. Aside from
the computational bother, this is not a useful general procedure because the very fact that smoothing is
needed means that we cannot see the underlying pattern clearly enough to know what particular polynomial
would be useful.
The simplest smoothing method is to plot the data on a logarithmic scale (or plot the logarithm of y
instead of y itself). Smoothing by plotting the moving averages (MA) or exponentially weighted moving
averages (EWMA) requires only arithmetic.
A moving average (MA) gives equal weight to a sequence of past values; the weight depends on how
many past values are to be remembered. The EWMA gives more weight to recent events and progressively
forgets the past. How quickly the past is forgotten is determined by one parameter. The EWMA will
follow the current observations more closely than the MA. Often this is desirable but this responsiveness
is purchased by a loss in smoothing.
The choice of a smoothing method might be influenced by the application. Because the EWMA forgets
the past, it may give a more realistic representation of the actual threat of the pollutant to the environment.
© 2002 By CRC Press LLC