Page 83 - Computational Statistics Handbook with MATLAB
P. 83

Chapter 3: Sampling Concepts                                     69


                             for a continuous random variable and by


                                                       Fa() =  ∑  fx ()                    (3.37)
                                                                    i
                                                              x ≤  a
                                                               i
                             for a discrete random variable. In this section, we examine the sample analog
                             of the cumulative distribution function called the empirical distribution
                             function. When it is not suitable to assume a distribution for the random vari-
                             able, then we can use the empirical distribution function as an estimate of the
                             underlying distribution. One can call this a nonparametric estimate of the
                             distribution function, because we are not assuming a specific parametric
                             form for the distribution that generates the random phenomena. In a para-
                             metric setting, we would assume a particular distribution generated the sam-
                             ple and estimate the cumulative distribution function by estimating the
                             appropriate parameters.
                              The empirical distribution function is based on the order statistics. The
                             order statistics for a sample are obtained by putting the data in ascending
                             order. Thus, for a random sample of size n, the order statistics are defined as
                                                     X 1() ≤  X 2() ≤ … ≤  X n()  ,


                             with  X i()   denoting the i-th order statistic. The order statistics for a random
                             sample can be calculated easily in MATLAB using the sort function.
                                                              ˆ
                              The empirical distribution function F n x()   is defined as the number of data
                             points less than or equal to x (# X i ≤(  x)  ) divided by the sample size n. It can
                             be expressed in terms of the order statistics as follows


                                                         0;  x <  X 1()
                                                 ˆ      
                                                          ⁄
                                                F n x() =   jn;  X ≤ x <  X (  j + 1)     (3.38)
                                                                 j ()
                                                        
                                                         1;  x ≥  X n() .
                             Figure 3.2 illustrates these concepts. We show the empirical cumulative dis-
                             tribution function for a standard normal and include the theoretical distribu-
                             tion function to verify the results. In the following section, we describe a
                             descriptive measure for a population called a quantile, along with its corre-
                             sponding estimate. Quantiles are introduced here, because they are based on
                             the cumulative distribution function.



                             Qu
                             Qu  annt  tilesiles ilesiles
                             a
                               aanntt
                             QuQu
                             Quantiles have a fundamental role in statistics. For example, they can be used
                             as a measure of central tendency and dispersion, they provide the critical val-
                            © 2002 by Chapman & Hall/CRC
   78   79   80   81   82   83   84   85   86   87   88