Page 13 - Statistics for Environmental Engineers
P. 13

L1592_frame_CH-01  Page 4  Tuesday, December 18, 2001  1:39 PM









                        Aberrant values. Values that stand out from the general trend are fairly common. They may occur
                       because of gross errors in sampling or measurement. They may be mistakes in data recording. If we think
                       only in these terms, it becomes too tempting to discount or throw out such values. However, rejecting
                       any value out of hand may lead to serious errors. Some early observers of stratospheric ozone concen-
                       trations failed to detect the hole in the ozone layer because their computer had been programmed to screen
                       incoming data for “outliers.” The values that defined the hole in the ozone layer were disregarded. This
                       is a reminder that rogue values may be real. Indeed, they may contain the most important information.
                        Censored data. Great effort and expense are invested in measurements of toxic and hazardous
                       substances that should be absent or else be present in only trace amounts. The analyst handles many
                       specimens for which the concentration is reported as “not detected” or “below the analytical method
                       detection limit.” This method of reporting censors the data at the limit of detection and condemns all
                       lower values to be qualitative. This manipulation of the data creates severe problems for the data analyst
                       and the person who needs to use the data to make decisions.
                        Large amounts of data (which are often observational data rather than data from designed experi-
                       ments). Every treatment plant, river basin authority, and environmental control agency has accumulated
                       a mass of multivariate data in filing cabinets or computer databases. Most of this is happenstance data.
                       It was collected for one purpose; later it is considered for another purpose. Happenstance data are
                       often ill suited for model building. They may be ill suited for detecting trends over time or for testing
                       any hypothesis about system behavior because (1) the record is not consistent and comparable from
                       period to period, (2) all variables that affect the system have not been observed, and (3) the range of
                       variables has been restricted by the system’s operation. In short, happenstance data often contain
                       surprisingly little information. No amount of analysis can extract information that does not exist.
                        Large measurement errors. Many biological and chemical measurements have large measurement
                       errors, despite the usual care that is taken with instrument calibration, reagent preparation, and personnel
                       training. There are efficient statistical methods to deal with random errors. Replicate measurements
                       can be used to estimate the random variation, averaging can reduce its effect, and other methods can
                       compare the random variation with possible real changes in a system. Systematic errors (bias) cannot
                       be removed or reduced by averaging.
                        Lurking variables. Sometimes important variables are not measured, for a variety of reasons. Such
                       variables are called lurking variables. The problems they can cause are discussed by Box (1966) and
                       Joiner (1981). A related problem occurs when a truly influential variable is carefully kept within a narrow
                       range with the result that the variable appears to be insignificant if it is used in a regression model.
                        Nonconstant variance. The error associated with measurements is often nearly proportional to the
                       magnitude of their measured values rather than approximately constant over the range of the measured
                       values. Many measurement procedures and instruments introduce this property.
                        Nonnormal distributions. We are strongly conditioned to think of data being symmetrically distributed
                       about their average value in the bell shape of the normal distribution. Environmental data seldom have
                       this distribution. A common asymmetric distribution has a long tail toward high values.
                        Serial correlation. Many environmental data occur as a sequence of measurements taken over time
                       or space. The order of the data is critical. In such data, it is common that the adjacent values are not
                       statistically independent of each other because the natural continuity over time (or space) tends to make
                       neighboring values more alike than randomly selected values. This property, called serial correlation,
                       violates the assumptions on which many statistical procedures are based. Even low levels of serial
                       correlation can distort estimation and hypothesis testing procedures.
                        Complex cause-and-effect relationships. The systems of interest — the real systems in the field — are
                       affected by dozens of variables, including many that cannot be controlled, some that cannot be measured
                       accurately, and probably some that are unidentified. Even if the known variables were all controlled, as
                       we try to do in the laboratory, the physics, chemistry, and biochemistry of the system are complicated
                       and difficult to decipher. Even a system that is driven almost entirely by inorganic chemical reactions
                       can be difficult to model (for example, because of chemical complexation and amorphous solids forma-
                       tion). The situation has been described by Box and Luceno (1997): “All models are wrong but some are
                       useful.” Our ambition is usually short of trying to discover all causes and effects. We are happy if we
                       can find a useful model.



                      © 2002 By CRC Press LLC
   8   9   10   11   12   13   14   15   16   17   18