Page 268 - Computational Statistics Handbook with MATLAB

P. 268

256 Computational Statistics Handbook with MATLAB

Exercises

7.1. The insulate data set [Hand, et al., 1994] contains observations
corresponding to the average outside temperature in degrees Celsius
and the amount of weekly gas consumption measured in 1000 cubic
feet. Do a scatterplot of the data corresponding to the measurements
taken before insulation was installed. What is a good model for this?
Use cross-validation with K = 1 to estimate the prediction error for
your model. Use cross-validation with K = 4 . Does your error change
significantly? Repeat the process for the data taken after insulation
was installed.
7.2. Using the same procedure as in Example 7.2, use a quadratic (degree
is 2) and a cubic (degree is 3) polynomial to build the model. What
is the estimated prediction error from these models? Which one seems
best: linear, quadratic or cubic?
7.3. The peanuts data set [Hand, et al., 1994; Draper and Smith, 1981]
contain measurements of the alfatoxin (X) and the corresponding
percentage of non-contaminated peanuts in the batch (Y). Do a scat-
terplot of these data. What is a good model for these data? Use cross-
validation to choose the best model.
7.4. Generate n = 25 random variables from a standard normal distribu-
tion that will serve as the random sample. Determine the jackknife
x
estimate of the standard error for , and calculate the bootstrap esti-
mate of the standard error. Compare these to the theoretical value of
the standard error (see Chapter 3).
7.5. Using a sample size of n = 15 , generate random variables from a
uniform (0,1) distribution. Determine the jackknife estimate of the
standard error for , and calculate the bootstrap estimate of the stan-
x
dard error for the same statistic. Let’s say we decide to use s ⁄ n as
an estimate of the standard error for . How does this compare to
x
the other estimates?
7.6. Use Monte Carlo simulation to compare the performance of the boot-
strap and the jackknife methods for estimating the standard error and
bias of the sample second central moment. For every Monte Carlo
trial, generate 100 standard normal random variables and calculate
the bootstrap and jackknife estimates of the standard error and bias.
Show the distribution of the bootstrap estimates (of bias and standard
error) and the jackknife estimates (of bias and standard error) in a
histogram or a box plot. Make some comparisons of the two methods.
7.7. Repeat problem 7.4 and use Monte Carlo simulation to compare the
bootstrap and jackknife estimates of bias for the sample coefficient of

263 264 265 266 267 268 269 270 271 272 273