Page 121 - Applied Statistics Using SPSS, STATISTICA, MATLAB and R
P. 121

100      3 Estimating Data Parameters


              The bootstrap idea (Efron, 1979) is to mimic the sampling distribution of the
           statistic of interest through the use  of many resamples with replacement of the
           original sample. In the present chapter we will restrict ourselves to illustrating the
           idea when applied to the computation of confidence intervals (bootstrap techniques
           cover a  vaster area than merely confidence interval computation). Let us then
           illustrate the bootstrap computation of confidence intervals by referring it to the
           mean of the n = 50 PRT measurements for Class=1 of the cork stoppers’
           dataset (as in Example 3.1). The histogram of these data is shown in Figure 3.7a.
              Denoting by X the associated random variable, we compute the sample mean of
           the data as  x = 365.0. The sample standard deviation of X , the standard error, is
           SE = s /  n =15.6. Since the dataset size, n, is not that large one may have some
           suspicion concerning the bias of this estimate and the accuracy of the confidence
           interval based on the normality assumption.
              Let us  now consider extracting at random and with  replacement  m = 1000
           samples of size n = 50 from the original dataset. These resamples are called
           bootstrap samples. Let  us  further consider that  for each  bootstrap sample we
                                                       2
           compute its mean x . Figure 3.7b shows the histogram  of the bootstrap distribution
           of the means. We see that this histogram looks similar to the normal distribution.
           As a matter of fact the bootstrap  distribution  of a statistic usually mimics the
           sample distribution of that statistic, which in this case happens to be normal.
                                               *
              Let us denote each bootstrap mean by x . The mean and standard deviation of
           the 1000 bootstrap means are computed as:

                     1        1
                                     *
              x boot  =  ∑  x  *  =  ∑  x = 365.1,
                    m        1000
                        1    (         2
              s  , x  boot  =  ∑  *  − x boot  )x  = 15.47,
                       m  −1

           where the summations extend to the m = 1000 bootstrap samples.
              We see that the mean of the bootstrap distribution is quite close to the original
           sample mean. There is a bias of only  x boot  − x  = 0.1. It can be shown that this is
           usually the size of the bias that can be expected between  x and the true population
           mean, µ. This property is not an exclusive of the bootstrap distribution of the mean.
           It applies to other statistics as well.
              The sample standard deviation  of the  bootstrap distribution, called  bootstrap
           standard error and denoted SE boot, is also quite close to the theory-based estimate
           SE = s /  n .  We could now  use  SE boot to compute a confidence  interval for the
           mean. In the case of the mean there is not much advantage in doing so (we should
           get practically the same result as in Example 3.1), since we have the Central Limit
           theorem  in which to base our confidence interval computations. The good thing


           2
               We should more rigorously say “one possible histogram”, since different histograms are
             possible depending on the resampling process. For n and m sufficiently large they are,
             however, close to each other.
   116   117   118   119   120   121   122   123   124   125   126