Page 79 - Statistics for Environmental Engineers
P. 79
L1592_frame_C08 Page 71 Tuesday, December 18, 2001 1:45 PM
8
Estimating Percentiles
KEY WORDS confidence intervals, distribution free estimation, geometric mean, lognormal distribu-
tion, normal distribution, nonparametric estimation, parametric estimation, percentile, quantile, rank
order statistics.
The use of percentiles in environmental standards and regulations has grown during the past few years.
England has water quality consent limits that are based on the 90th and 95th percentiles of monitoring
data not exceeding specified levels. The U.S. EPA has specifications for air quality monitoring that are,
in effect, percentile limitations. These may, for example, specify that the ambient concentration of a
compound cannot be exceeded more often than once a year (the 364/365th percentile). The U.S. EPA
has provided guidance for setting aquatic standards on toxic chemicals that require estimating 99th
percentiles and using this statistic to make important decisions about monitoring and compliance. They
have also used the 99th percentile to establish maximum daily limits for industrial effluents (e.g., pulp
and paper). Specifying a 99th percentile in a decision-making rule gives an impression of great conser-
vatism, or of having great confidence in making the “safe” and therefore correct environmental decision.
Unfortunately, the 99th percentile is a statistic that cannot be estimated precisely.
Definition of Quantile and Percentile
The population distribution is the true underlying pattern. Figure 8.1 shows a lognormal population
distribution of y and the normal distribution that is obtained by the transformation x = ln(y). The
population 50th percentile (the median), and 90th, 95th, and 99th percentiles are shown. The population
pth percentile, y p , is a parameter that, in practice, is unknown and must be estimated from data. The
estimate of the percentile is denoted by . In this chapter, the parametric estimation method and one
y ˆ p
nonparametric estimation method are shown.
The pth quantile is a population parameter and is denoted by y p . (Chapter 2 stated that parameters
would be indicated with Greek letters but this convention is violated in this chapter.) By definition, a
proportion p of the population is smaller or equal to y p and a proportion 1 – p is larger than y p . Quantiles
are expressed as decimal fractions.
Quantiles expressed as percentages are called percentiles. For example, the 0.5 quantile is equivalent
to the 50th percentile; the 0.99 quantile is the 99th percentile. The 95th percentile will be denoted as y 95 .
A quartile of the distribution contains one-fourth of the area under the frequency distribution (and
one-fourth of the data points). Thus, the distribution is divided into four equal areas by y 0.250 (the lower
quantile), the median, y 0.5 (the 0.5 quantile, or median), and y 0.75 (known as the upper quartile).
Parametric Estimates of Quantiles
If we know or are willing to assume the population distribution, we can use a parametric method. Parametric
quantile (percentile) estimation will be discussed initially in terms of the normal distribution. The same
methods can be used on nonnormally distributed data after transformation to make them approximately
normal. This is convenient because the properties of the normal distribution are known and accessible in tables.
© 2002 By CRC Press LLC