Page 61 - Statistics for Environmental Engineers
P. 61

L1592_Frame_C05  Page 53  Tuesday, December 18, 2001  1:42 PM











                       Comments
                       We are almost always interested in knowing the shape of a sample’s distribution. Often it is important
                       to know whether a set of data is distributed symmetrically about a central value, or whether there is a
                       tail of data toward a high or a low value. It may be important to know what fraction of time a critical
                       value is exceeded.
                        Dot plots and probability plots are useful graphical tools for seeing the shape of a distribution. To
                       avoid misinterpreting probability plots, use them only in conjunction with other plots. Make dot diagrams
                       and, if the data are sequential in time, a time series plot. Sometimes these graphs provide all the important
                       information and the probability plot is unnecessary.
                        Probability plots are convenient for estimating percentile values, especially the median (50th percen-
                       tile) and extreme values. It is not necessary for the probability plot to be a straight line to do this. If it
                       is straight, draw a straight line. But if it is not straight, draw a smooth curve through the plotted points
                       and go ahead with the estimation.
                        Do not use probability plots to estimate the mean and standard deviation except in the very special
                       case when the data give a linear plot on normal probability paper.  This special case is common in
                       textbooks, but rare with real environmental data. If the data plot as a straight line on log-probability
                       paper, the 50th percentile value is not the mean (it is the geometric mean) and there is no distance that
                       can be measured on the plot to estimate the standard deviation.
                        Probability plots may be useful in discovering the distribution of the data in a sample. Sometimes the
                       analysis is not clear-cut. Because of random sampling variation, the curve can have a substantial amount
                       of “wiggle” when the data actually are normally distributed. When the number of observations approaches
                       50, the shape of the probability distribution becomes much more clear than when the sample is small
                       (for example, 20 observations). Hahn and Shapiro (1967) point out that:

                          1. The variance of points in the tails (extreme low or high plotted values) will be larger than
                             that of points at the center of the distribution. Thus, the relative linearity of the plot near the
                             tails of the distribution will often seem poorer than at the center even if the correct model
                             for the probability density distribution has been chosen.
                          2. The plotted points are ordered and hence are not independent. Thus, we should not expect
                             them to be randomly scattered about a line. For example, the points immediately following
                             a point above the line are also likely to be above the line. Even if the chosen model is correct,
                             the plot may consist of a series of successive points (known as runs) above and below the line.
                          3. A model can never be proven to be adequate on the basis of sample data. Thus, the probability
                             of a small sample taken from a near-normal distribution will frequently not differ appreciably
                             from that of a sample from a normal distribution.

                        If the data have positive skew, it is often convenient to use graph paper that has a log scale on one
                       axis and a normal probability scale on the other axis. If the logarithms of the data are normally distributed,
                       this kind of graph paper will produce a straight-line probability plot. The log scale may provide a
                       convenient scaling for the graph even if it does not produce a straight-line plot; for example, when the
                       data are bacterial counts that range from 10 to 100,000.




                       References
                       Hahn, G. J. and S. S. Shapiro (1967). Statistical Methods for Engineers, New York, John Wiley.
                       Hirsch, R. M. and J. D. Stedinger (1987). “Plotting Positions for Historical Floods and Their Precision,” Water
                           Resources Research, 23(4), 715–727.
                       Mage, D. T. (1982). “An Objective Graphical Method for Testing Normal Distributional Assumptions Using
                           Probability Plots,” Am. Statistician, 36, 116–120.
                       © 2002 By CRC Press LLC
   56   57   58   59   60   61   62   63   64   65   66