Page 79 - Statistics for Environmental Engineers
P. 79

L1592_frame_C08  Page 71  Tuesday, December 18, 2001  1:45 PM









                       8




                       Estimating Percentiles






                       KEY WORDS confidence intervals, distribution free estimation, geometric mean, lognormal distribu-
                       tion, normal distribution, nonparametric estimation, parametric estimation, percentile, quantile, rank
                       order statistics.

                       The use of percentiles in environmental standards and regulations has grown during the past few years.
                       England has water quality consent limits that are based on the 90th and 95th percentiles of monitoring
                       data not exceeding specified levels. The U.S. EPA has specifications for air quality monitoring that are,
                       in effect, percentile limitations. These may, for example, specify that the ambient concentration of a
                       compound cannot be exceeded more often than once a year (the 364/365th percentile). The U.S. EPA
                       has provided guidance for setting aquatic standards on toxic chemicals that require estimating 99th
                       percentiles and using this statistic to make important decisions about monitoring and compliance. They
                       have also used the 99th percentile to establish maximum daily limits for industrial effluents (e.g., pulp
                       and paper). Specifying a 99th percentile in a decision-making rule gives an impression of great conser-
                       vatism, or of having great confidence in making the “safe” and therefore correct environmental decision.
                       Unfortunately, the 99th percentile is a statistic that cannot be estimated precisely.



                       Definition of Quantile and Percentile
                       The population distribution is the true underlying pattern.  Figure 8.1 shows a lognormal population
                       distribution of  y and the normal distribution that is obtained by the transformation  x  = ln(y). The
                       population 50th percentile (the median), and 90th, 95th, and 99th percentiles are shown. The population
                       pth percentile, y p , is a parameter that, in practice, is unknown and must be estimated from data. The
                       estimate of the percentile is denoted by  . In this chapter, the parametric estimation method and one
                                                      y ˆ p
                       nonparametric estimation method are shown.
                        The pth quantile is a population parameter and is denoted by y p . (Chapter 2 stated that parameters
                       would be indicated with Greek letters but this convention is violated in this chapter.) By definition, a
                       proportion p of the population is smaller or equal to y p  and a proportion 1 – p is larger than y p . Quantiles
                       are expressed as decimal fractions.
                        Quantiles expressed as percentages are called percentiles. For example, the 0.5 quantile is equivalent
                       to the 50th percentile; the 0.99 quantile is the 99th percentile. The 95th percentile will be denoted as y 95 .
                        A quartile of the distribution contains one-fourth of the area under the frequency distribution (and
                       one-fourth of the data points). Thus, the distribution is divided into four equal areas by y 0.250  (the lower
                       quantile), the median, y 0.5  (the 0.5 quantile, or median), and y 0.75  (known as the upper quartile).



                       Parametric Estimates of Quantiles

                       If we know or are willing to assume the population distribution, we can use a parametric method. Parametric
                       quantile (percentile) estimation will be discussed initially in terms of the normal distribution. The same
                       methods can be used on nonnormally distributed data after transformation to make them approximately
                       normal. This is convenient because the properties of the normal distribution are known and accessible in tables.


                       © 2002 By CRC Press LLC
   74   75   76   77   78   79   80   81   82   83   84