Page 355 - Statistics for Environmental Engineers
P. 355

L1592_Frame_C41  Page 365  Tuesday, December 18, 2001  3:24 PM









                       41




                       The Effect of Autocorrelation on Regression






                       KEY WORDS autocorrelation, autocorrelation coefficient, drift, Durbin-Watson statistic, randomiza-
                       tion, regression, time series, trend analysis, serial correlation, variance (inflation).

                       Many environmental data exist as sequences over time or space. The time sequence is obvious in some
                       data series, such as daily measurements on river quality. A characteristic of such data can be that neighboring
                       observations tend to be somewhat alike. This tendency is called autocorrelation. Autocorrelation can also
                       arise in laboratory experiments, perhaps because of the sequence in which experimental runs are done or
                       drift in instrument calibration. Randomization reduces the possibility of autocorrelated results. Data from
                       unplanned or unrandomized experiments should be analyzed with an eye open to detect autocorrelation.
                        Most statistical methods, estimation of confidence intervals, ordinary least squares regression, etc.
                       depend on the residual errors being independent, having constant variance, and being normally distrib-
                       uted. Independent means that the errors are not autocorrelated. The errors in statistical conclusions caused
                       by violating the condition of independence can be more serious than those caused by not having normality.
                        Parameter estimates may or may not be seriously affected by autocorrelation, but unrecognized (or
                       ignored) autocorrelation will bias estimates of variances and any statistics calculated from variances.
                       Statements about probabilities, including confidence intervals, will be wrong.
                        This chapter  explains why ignoring or  overlooking autocorrelation can lead to serious errors and
                       describes the Durbin-Watson test for detecting autocorrelation in the residuals of a fitted model. Checking
                       for autocorrelation is relatively easy although it may go undetected even when present in small data
                       sets. Making suitable provisions to incorporate existing autocorrelation into the data analysis can be
                       difficult. Some useful references are given but the best approach may be to consult with a statistician.




                       Case Study: A Suspicious Laboratory Experiment
                       A laboratory experiment was done to demonstrate to students that increasing factor X by one unit should
                       cause factor Y to increase by one-half a unit. Preliminary experiments indicated that the standard deviation
                       of repeated measurements on Y was about 1 unit. To make measurement errors small relative to the
                       signal, the experiment was designed to produce 20 to 25 units of y. The procedure was to set x and,
                       after a short time, to collect a specimen on which y would be measured. The measurements on y were
                       not started until all 11 specimens had been collected. The data, plotted in Figure 41.1, are:


                            x =   0     1     2     3     4     5     6     7     8     9    10
                            y =   21.0  21.8  21.3  22.1  22.5  20.6  19.6  20.9  21.7  22.8  23.6

                        Linear regression gave   = 21.04 + 0.12x, with R  = 0.12. This was an unpleasant surprise. The 95%y ˆ  2
                       confidence interval of the slope was –0.12 to 0.31, which does not include the theoretical slope of 0.5
                       that the experiment was designed to reveal. Also, this interval includes zero so we cannot even be sure
                       that x and y are related.



                       © 2002 By CRC Press LLC
   350   351   352   353   354   355   356   357   358   359   360