Page 55 - Statistics for Environmental Engineers
P. 55

L1592_Frame_C05  Page 47  Tuesday, December 18, 2001  1:42 PM









                       5




                       Seeing the Shape of a Distribution






                       KEY WORDS dot diagram, histogram, probability distribution, cumulative probability distribution,
                       frequency diagram.

                       The data in a sample have some frequency distribution, perhaps symmetrical or perhaps skewed. The
                       statistics (mean, variance, etc.) computed from these data also have some distribution. For example, if the
                       problem is to establish a 95% confidence interval on the mean, it is not important that the sample is normally
                       distributed because the distribution of the mean tends to be normal regardless of the sample’s distribution.
                       In contrast, if the problem is to estimate how frequently a certain value will be exceeded, it is essential to
                       base the estimate on the correct distribution of the sample. This chapter is about the shape of the distribution
                       of the data in the sample and not the distribution of statistics computed from the sample.
                        Many times the first analysis done on a set of data is to compute the mean and standard deviation. These
                       two statistics fully characterize a normal distribution. They do not fully describe other distributions. We
                       should not assume that environmental data will be normally distributed. Experience shows that stream quality
                       data, wastewater treatment plant influent and effluent data, soil properties, and air quality data typically do
                       not have normal distributions. They are more likely to have a long tail skewed toward high values (positive
                       skewness). Fortunately, one need not assume the distribution. It can be discovered from the data.
                        Simple plots help reveal the sample’s distribution. Some of these plots have already been discussed
                       in Chapters 2 and 3. Dot diagrams are particularly useful. These simple plots have been overlooked and
                       underused. Environmental engineering references are likely to advise, by  example if not by  explicit
                       advice, the construction of a probability plot (also known as the cumulative frequency plot). Probability
                       plots can be useful. Their construction and interpretation and the  ways in which such plots can be
                       misused will be discussed.



                       Case Study: Industrial Waste Survey Data Analysis
                       The BOD (5-day) data given in Table 5.1 were obtained from an industrial wastewater survey (U.S. EPA,
                       1973). There are 99 observations, each measured on a 4-hr composite sample, giving six observations
                       daily for 16 days, plus three observations on the 17th day. The survey was undertaken to estimate the
                       average BOD and to estimate the concentration that is exceeded some small fraction of the time (for
                       example, 10%). This information is needed to design a treatment process. The pattern of variation also
                       needs to be seen because it will influence the feasibility of using an equalization process to reduce the
                       variation in BOD loading. The data may have other interesting properties, so the data presentation should
                       be complete, clear, and not open to misinterpretation.



                       Dot Diagrams

                       Figure 5.1 is a time series plot of the data. The concentration fluctuates rapidly with more or less equal
                       variation above and below the average, which is 687 mg/L. The range is from 207 to 1185 mg/L. The
                       BOD may change by 1000 mg/L from one sampling interval to the next. It is not clear whether the ups
                       and downs are random or are part of some cyclic pattern. There is little else to be seen from this plot.


                       © 2002 By CRC Press LLC
   50   51   52   53   54   55   56   57   58   59   60