Page 82 - Statistics for Environmental Engineers
P. 82

L1592_frame_C08  Page 74  Tuesday, December 18, 2001  1:45 PM










                           Rank    575   574   573   572   571   570   569   568   567   566   565
                           BOD   10565  10385  7820  7580  7322  7123  6627  6289  6261  6079  5977

                           The 99th percentile is located at observation number p(n + 1) = 0.99(575 + 1) = 570.24. Because
                           this is not an integer, interpolate between the 570th and 571st largest observations to estimate
                                = 7171.
                            y ˆ 0.99
                        The disadvantage of this method is that only the few largest observed values are used to estimate the
                       percentile. The lower values are not used, except as they contribute to ranking the large values. Discarding
                       these lower values throws away information that could be used to get more precise parameter estimates
                       if the shape of the population distribution could be identified and used to make a parametric estimate.
                        Another disadvantage is that the data set must be large enough that extrapolation is unnecessary. A
                       95th percentile can be estimated from 20 observations, but a 99th percentile cannot be estimated with
                       less than 100 observations. The data set should be much larger than the minimum if the estimates are
                       to be much good. The advisability of this is obvious from a probability plot, which clearly shows that greatest
                       uncertainty is in the location of the extreme quantiles (the tails of the distribution). This uncertainty can
                       be expressed as confidence limits.
                        The confidence limits for quantiles that have been estimated using the nonparametric method can be
                       determined with the following formulas if n > 20 observations (Gilbert, 1987). Compute the rank order
                       of two-sided confidence limits (LCL and UCL):

                                                  (
                                              Rank LCL) =  pn +  1) –  z α /2 np 1 –  p)
                                                           (
                                                                         (
                                              Rank UCL) =  pn + 1) +  z α /2 np 1 –(  p)
                                                           (
                                                  (
                       The rank of the one-sided 1 – α upper confidence limit is obtained by computing:
                                              Rank UCL) =  pn +  1) +  z α np 1 –  p)
                                                            (
                                                  (
                                                                        (
                       Because Rank(UCL) and Rank(LCL) are usually not integers, the limits are obtained by linear interpo-
                       lation between the closest ordered values.


                       Example 8.5

                                                                                   = 7171, for n = 575
                           The 95% two-sided confidence limits for the Example 8.4 estimate of y ˆ 0.99
                           observations and α = 0.05, are calculated using z α /2  = z 0.025  = 1.96 and
                                        Rank LCL) =  0.99 576) 1.96 575 0.99) 0.01) =  565.6
                                                                    (
                                                                         (
                                                       (
                                            (
                                                           –
                                                       (
                                                                         (
                                        Rank UCL) =  0.99 576) +  1.96 575 0.99) 0.01) =  574.9
                                                                    (
                                            (
                           Interpolating between observations 565 and 566, and between observations 574 and 575, gives
                           LCL = 6038 and UCL = 10,547.
                       Comments
                       Quantiles and percentiles can be estimated using parametric or nonparametric methods. The nonparametric
                       method is simple, but the sample must contain more than p observations to estimate the pth quantile (and
                       still more observations if the upper confidence limits are needed). Use the nonparametric method whenever
                       you are unwilling or unable to specify a plausible distribution for the sample. Parametric estimates should
                       be made whenever the distribution can be identified because the estimates will more precise than those
                       © 2002 By CRC Press LLC
   77   78   79   80   81   82   83   84   85   86   87