Page 355 - Statistics for Environmental Engineers

P. 355

L1592_Frame_C41 Page 365 Tuesday, December 18, 2001 3:24 PM

The Effect of Autocorrelation on Regression

KEY WORDS autocorrelation, autocorrelation coefﬁcient, drift, Durbin-Watson statistic, randomiza-
tion, regression, time series, trend analysis, serial correlation, variance (inﬂation).

Many environmental data exist as sequences over time or space. The time sequence is obvious in some
data series, such as daily measurements on river quality. A characteristic of such data can be that neighboring
observations tend to be somewhat alike. This tendency is called autocorrelation. Autocorrelation can also
arise in laboratory experiments, perhaps because of the sequence in which experimental runs are done or
drift in instrument calibration. Randomization reduces the possibility of autocorrelated results. Data from
unplanned or unrandomized experiments should be analyzed with an eye open to detect autocorrelation.
Most statistical methods, estimation of conﬁdence intervals, ordinary least squares regression, etc.
depend on the residual errors being independent, having constant variance, and being normally distrib-
uted. Independent means that the errors are not autocorrelated. The errors in statistical conclusions caused
by violating the condition of independence can be more serious than those caused by not having normality.
Parameter estimates may or may not be seriously affected by autocorrelation, but unrecognized (or
ignored) autocorrelation will bias estimates of variances and any statistics calculated from variances.
Statements about probabilities, including conﬁdence intervals, will be wrong.
This chapter explains why ignoring or overlooking autocorrelation can lead to serious errors and
describes the Durbin-Watson test for detecting autocorrelation in the residuals of a ﬁtted model. Checking
for autocorrelation is relatively easy although it may go undetected even when present in small data
sets. Making suitable provisions to incorporate existing autocorrelation into the data analysis can be
difﬁcult. Some useful references are given but the best approach may be to consult with a statistician.

Case Study: A Suspicious Laboratory Experiment
A laboratory experiment was done to demonstrate to students that increasing factor X by one unit should
cause factor Y to increase by one-half a unit. Preliminary experiments indicated that the standard deviation
of repeated measurements on Y was about 1 unit. To make measurement errors small relative to the
signal, the experiment was designed to produce 20 to 25 units of y. The procedure was to set x and,
after a short time, to collect a specimen on which y would be measured. The measurements on y were
not started until all 11 specimens had been collected. The data, plotted in Figure 41.1, are:

x = 0 1 2 3 4 5 6 7 8 9 10
y = 21.0 21.8 21.3 22.1 22.5 20.6 19.6 20.9 21.7 22.8 23.6

Linear regression gave = 21.04 + 0.12x, with R = 0.12. This was an unpleasant surprise. The 95%y ˆ 2
conﬁdence interval of the slope was –0.12 to 0.31, which does not include the theoretical slope of 0.5
that the experiment was designed to reveal. Also, this interval includes zero so we cannot even be sure
that x and y are related.

350 351 352 353 354 355 356 357 358 359 360