Page 122 - Applied Statistics Using SPSS, STATISTICA, MATLAB and R
P. 122

3.6 Bootstrap Estimation   101


           about the  bootstrap technique is that it also  often  works for other statistics for
           which  no theory on  sampling  distribution  is available. As a matter of  fact, the
           bootstrap distribution usually – for a not too small original sample size, say n > 50
           − has the same shape and spread as the  original  sampling distribution, but  is
           centred at the original statistic value rather than the true parameter value.


              12                               300
               n                                 n
              10                               250

              8                                200
              6                                150

              4                                100
              2                                50
                                           x                                x  *
            a   0 100  200  300  400  500  600  700 b   0 300  320  340  360  380  400  420
             Figure 3.7.  a) Histogram of the PRT data; b) Histogram of the bootstrap means.


              Suppose that the bootstrap distribution of a statistic, w, is approximately normal
           and that the  bootstrap estimate of  bias is  small. We then compute a two-sided
           bootstrap confidence interval at α risk, for the parameter that corresponds to the
           statistic, by the following formula:

              w  t ±  n−  1 , 1 − α  2 /  SE boot

              We may use the percentiles of the normal distribution, instead of the Student’s t
           distribution, whenever m is very large.
              The  question  naturally arises on  how large must the  number  of  bootstrap
           samples be in order to obtain a reliable bootstrap distribution with reliable values
           of SE boot ? A good rule of thumb for m, based on theoretical and practical evidence,
           is to choose m  ≥ 200.
              The following examples illustrate the computation of confidence intervals using
           the bootstrap technique.


           Example 3.9
           Q: Consider the percentage of lime, CaO, in the composition of clays, a sample of
           which constitutes the Clays’ dataset. Compute the confidence interval at 95%
           level of the two-tail 5% trimmed mean and discuss the results. (The two-tail 5%
           trimmed mean disregards 10% of the cases, 5% at each of the tails.)
   117   118   119   120   121   122   123   124   125   126   127