Page 126 - Applied Statistics Using SPSS, STATISTICA, MATLAB and R

P. 126

3.6 Bootstrap Estimation 105

In the above Example 3.11 we observe in Figure 3.12 a histogram that doesn’t
look to be well approximated by the normal distribution. As a matter of fact any
goodness of fit test described in section 5.1 will reject the normality hypothesis.
This is a common difficulty when estimating bootstrap confidence intervals for the
median. An explanation of the causes of this difficulty can be found e.g. in
(Hesterberg T et al., 2003). This difficulty is even more severe when the data size n
is small (see Exercise 3.20). Nevertheless, for data sizes larger then 100 cases, say,
and for a large number of resamples, one can still rely on bootstrap estimates of the
median as in Example 3.11.

Example 3.12
Q: Consider the variables Al2O3 and K2O of the Clays’ dataset (n = 94 cases).
Using the bootstrap method compute the confidence interval at 5% level of their
Pearson correlation.
A: The sample Pearson correlation of Al2O3 and K2O is r ≡ w = 0.6922. The
histogram of the bootstrap distribution of the Pearson correlation with m = 1000
resamples is shown in Figure 3.13. It is well approximated by the normal
distribution. From the bootstrap distribution we compute:

w boot = 0.6950
SE boot = 0.0719

The bias w boot − w = 0.6950 – 0.6922 = 0.0028 is quite small (about 0.4% of the
correlation value). We therefore compute the bootstrap confidence interval of the
Pearson correlation as:

w t ± 93 . 0 , 975 SE boot = 0.6922 ± 1.9858×0.0719 = 0.69 ± 0.14

300
n
250
200

150
100

50
w*
0
0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95
Figure 3.13. Histogram of the bootstrap distribution of the Pearson correlation
between the variables Al2O3 and K2O of the Clays’ dataset (1000 resamples).

121 122 123 124 125 126 127 128 129 130 131