Page 57 - Statistics for Environmental Engineers
P. 57

L1592_Frame_C05  Page 49  Tuesday, December 18, 2001  1:42 PM










                                           4 am
                                           8 am
                                          Time of Day  4 pm
                                           12 N


                                           8 pm

                                          12 MN
                                                 200   400   600   800   1000  1200
                                                         BOD Concentration, mg/L

                       FIGURE 5.3 Dot diagrams of the data for each sampling time.

                       normal, the extreme values would be relatively rare in comparison to other values. Here, they are no
                       more rare than values near the average. The designer may feel that the rapid fluctuation with no tendency
                       to cluster toward one average or central value is the most important feature of the data.
                        The elegantly simple dot diagram and the time series plot have beautifully described the data. No
                       numerical summary could transmit the same information as efficiently and clearly. Assuming a “normal-
                       like” distribution and reporting the average and standard deviation would be very misleading.


                       Probability Plots

                       A probability plot is not needed to interpret the data in Table 5.1 because the time series plot and dot
                       diagrams expose the important characteristics of the data. It is instructive, nevertheless, to use these data
                       to illustrate how a probability plot is constructed, how its shape is related to the shape of the frequency
                       distribution, and how it could be misused to estimate population characteristics.
                        The probability plot, or cumulative frequency distribution, shown in Figure 5.4 was constructed by
                       ranking the observed values from small to large, assigning each value a rank, which will be denoted by
                       i, and calculating the plotting position of the probability scale as p = i/(n + 1), where n is the total
                       number of observations. A portion of the ranked data and their calculated plotting positions are shown
                       in Table 5.2. The relation p = i/(n + 1) has traditionally been used by engineers. Statisticians seem to
                                                              1
                       prefer p = (i − 0.5)/n, especially when n is small.  The major differences in plotting position values
                       computed from these formulas occur in the tails of the distribution (high and low ranks). These differences
                       diminish in importance as the sample size increases.
                        Figure 5.4(top) is a normal probability plot of the data, so named because the probability scale (the
                       ordinate) is arranged in a special way to give a straight line plot when the data are normally distributed.
                       Any frequency distribution that is not normal will plot as a curve on the normal probability scale used
                       in Figure 5.4(top). The abcissa is an arithmetic scale showing the BOD concentration. The ordinate is
                       a cumulative probability scale on which the calculated p values are plotted to show the probability that
                       the BOD is less than the value shown on the abcissa.
                        Figure 5.4 shows that the BOD data are distributed symmetrically,  but not in the form of a normal
                       distribution. The S-shaped curve is characteristic of distributions that have more observations on the tails than
                       predicted by the normal distribution. This kind of distribution is called “heavy tailed.” A data set that is light-
                       tailed (peaked) or skewed will also have an S-shape, but with different curvature (Hahn and Shapiro, 1967).
                        There is often no reason to make the probability plot take the form of a straight line. If a straight line
                       appears to describe the data, draw such a line on the graph “by eye.” If a straight line does not appear
                       to describe the points, and you feel that a line needs to be drawn to emphasize the pattern, draw a


                       1
                       There are still other possibilities for the probability plotting positions (see Hirsch and Stedinger, 1987). Most have the gen-
                       eral form of p = (i − a)/(n + 1 − 2a), where a is a constant between 0.0 and 0.5. Some values are: a = 0 (Weibull), a = 0.5
                       (Hazen), and a = 0.375 (Blom).
                       © 2002 By CRC Press LLC
   52   53   54   55   56   57   58   59   60   61   62