Page 83 - Computational Statistics Handbook with MATLAB
P. 83
Chapter 3: Sampling Concepts 69
for a continuous random variable and by
Fa() = ∑ fx () (3.37)
i
x ≤ a
i
for a discrete random variable. In this section, we examine the sample analog
of the cumulative distribution function called the empirical distribution
function. When it is not suitable to assume a distribution for the random vari-
able, then we can use the empirical distribution function as an estimate of the
underlying distribution. One can call this a nonparametric estimate of the
distribution function, because we are not assuming a specific parametric
form for the distribution that generates the random phenomena. In a para-
metric setting, we would assume a particular distribution generated the sam-
ple and estimate the cumulative distribution function by estimating the
appropriate parameters.
The empirical distribution function is based on the order statistics. The
order statistics for a sample are obtained by putting the data in ascending
order. Thus, for a random sample of size n, the order statistics are defined as
X 1() ≤ X 2() ≤ … ≤ X n() ,
with X i() denoting the i-th order statistic. The order statistics for a random
sample can be calculated easily in MATLAB using the sort function.
ˆ
The empirical distribution function F n x() is defined as the number of data
points less than or equal to x (# X i ≤( x) ) divided by the sample size n. It can
be expressed in terms of the order statistics as follows
0; x < X 1()
ˆ
⁄
F n x() = jn; X ≤ x < X ( j + 1) (3.38)
j ()
1; x ≥ X n() .
Figure 3.2 illustrates these concepts. We show the empirical cumulative dis-
tribution function for a standard normal and include the theoretical distribu-
tion function to verify the results. In the following section, we describe a
descriptive measure for a population called a quantile, along with its corre-
sponding estimate. Quantiles are introduced here, because they are based on
the cumulative distribution function.
Qu
Qu annt tilesiles ilesiles
a
aanntt
QuQu
Quantiles have a fundamental role in statistics. For example, they can be used
as a measure of central tendency and dispersion, they provide the critical val-
© 2002 by Chapman & Hall/CRC