Page 264 - Computational Statistics Handbook with MATLAB

P. 264

252 Computational Statistics Handbook with MATLAB

not appear. Efron and Tibshirani [1993] show that if n ≥ 10 and B ≥ 20 , then
.
the probability is low that every bootstrap sample contains a given point x i
ˆ i – () by taking the bootstrap replicates for samples
We estimate the value of γ B
. These steps are outlined below.
that do not contain the data point x i
PROCEDURE - JACKKNIFE-AFTER-BOOTSTRAP

,
,
1. Given a random sample x = ( x 1 … x n ) , calculate a statistic of
interest . θ ˆ
2. Sample with replacement from the original sample to get a boot-
,
,
*
strap sample x *b = ( x 1 … x n * . )
3. Using the sample obtained in step 2, calculate the same statistic
that was determined in step one and denote by θ ˆ *b .
4. Repeat steps 2 through 3, B times to estimate the distribution of . θ
ˆ
θ
ˆ
5. Estimate the desired feature of the distribution of (e.g., standard
error, bias, etc.) by calculating the corresponding feature of the
distribution of θ ˆ *b . Denote this bootstrap estimated feature as γ ˆ B .
ˆ , ,
6. Now get the error in γ B . For i = 1 … n , find all samples
,
,
*
*
x *b = ( x 1 … x n ) that do not contain the point x i . These are the
ˆ i – ()
bootstrap samples that can be used to calculate γ B .
ˆ
7. Calculate the estimate of the variance of γ B using Equation 7.21.
Example 7.9
In this example, we show how to implement the jackknife-after-bootstrap
procedure. For simplicity, we will use the MATLAB Statistics Toolbox func-
tion called bootstrp, because it returns the indices for each bootstrap sam-
ple and the corresponding bootstrap replicate θ ˆ *b . We return now to the law
data where our statistic is the sample correlation coefficient. Recall that we
ˆ
wanted to estimate the standard error of the correlation coefficient, so γ B will
be the bootstrap estimate of the standard error.
% Use the law data.
load law
lsat = law(:,1);
gpa = law(:,2);

% Use the example in MATLAB documentation.
B = 1000;
[bootstat,bootsam] = bootstrp(B,'corrcoef',lsat,gpa);
The output argument bootstat contains the B bootstrap replicates of the
statistic we are interested in, and the columns of bootsam contains the indi-
ces to the data points that were in each bootstrap sample. We can loop

259 260 261 262 263 264 265 266 267 268 269