Page 284 - Statistics for Environmental Engineers
P. 284
L1592_frame_C32 Page 289 Tuesday, December 18, 2001 2:50 PM
32
Serial Correlation
KEY WORDS ACF, autocorrelation, autocorrelation coefficient, BOD, confidence interval, correlation,
correlation coefficient, covariance, independence, lag, sample size, sampling frequency, serial correlation,
serial dependence, variance.
When data are collected sequentially, there is a tendency for observations taken close together (in time
or space) to be more alike than those taken farther apart. Stream temperatures, for example, may show
great variation over a year, while temperatures one hour apart are nearly the same. Some automated
monitoring equipment make measurements so frequently that adjacent values are practically identical.
This tendency for neighboring observations to be related is serial correlation or autocorrelation. One
measure of the serial dependence is the autocorrelation coefficient, which is similar to the Pearson corre-
lation coefficient discussed in Chapter 31. Chapter 51 will deal with autocorrelation in the context of
time series modeling.
Case Study: Serial Dependence of BOD Data
A total of 120 biochemical oxygen demand (BOD) measurements were made at two-hour intervals to
study treatment plant dynamics. The data are listed in Table 32.1 and plotted in Figure 32.1. As one
would expect, measurements taken 24 h apart (12 sampling intervals) are similar. The task is to examine
this daily cycle and the assess the strength of the correlation between BOD values separated by one, up
to at least twelve, sampling intervals.
Correlation and Autocorrelation Coefficients
Correlation between two variables x and y is estimated by the sample correlation coefficient:
(
∑ x i –( x) y i – y)
r = -----------------------------------------------------
(
∑ x i –( x) ∑ y i – y) 2
2
where and are the sample means. The correlation coefficient (r) is a dimensionless number that canx y
range from −1 to + 1.
Serial correlation, or autocorrelation, is the correlation of a variable with itself. If sufficient data are
available, serial dependence can be evaluated by plotting each observation y t against the immediately
preceding one, y t−1 . (Plotting y t vs. y t+1 is equivalent to plotting y t vs. y t−1 .) Similar plots can be made
for observations two units apart (y t vs. y t−2 ), three units apart, etc.
If measurements were made daily, a plot of y t vs. y t−7 might indicate serial dependence in the form of
a weekly cycle. If y represented monthly averages, y t vs. y t−12 might reveal an annual cycle. The distance
between the observations that are examined for correlation is called the lag. The convention is to measure
lag as the number of intervals between observations and not as real time elapsed. Of course, knowing
the time between observations allows us to convert between real time and lag time.
© 2002 By CRC Press LLC