Page 268 - Computational Statistics Handbook with MATLAB
P. 268

256                        Computational Statistics Handbook with MATLAB






                             Exercises

                             7.1. The  insulate  data  set [Hand, et al., 1994]  contains observations
                                corresponding to the average outside temperature in degrees Celsius
                                and the amount of weekly gas consumption measured in 1000 cubic
                                feet. Do a scatterplot of the data corresponding to the measurements
                                taken before insulation was installed. What is a good model for this?
                                Use cross-validation with  K =  1   to estimate the prediction error for
                                your model. Use cross-validation with K =  4  . Does your error change
                                significantly? Repeat the process for the data taken after insulation
                                was installed.
                             7.2. Using the same procedure as in Example 7.2, use a quadratic (degree
                                is 2) and a cubic (degree is 3) polynomial to build the model. What
                                is the estimated prediction error from these models? Which one seems
                                best: linear, quadratic or cubic?
                             7.3. The peanuts data set [Hand, et al., 1994; Draper and Smith, 1981]
                                contain measurements of the  alfatoxin (X) and the corresponding
                                percentage of non-contaminated peanuts in the batch (Y). Do a scat-
                                terplot of these data. What is a good model for these data? Use cross-
                                validation to choose the best model.
                             7.4. Generate n =  25  random variables from a standard normal distribu-
                                tion that will serve as the random sample. Determine the jackknife
                                                               x
                                estimate of the standard error for  , and calculate the bootstrap esti-
                                mate of the standard error. Compare these to the theoretical value of
                                the standard error (see Chapter 3).
                             7.5. Using a sample size of  n =  15  , generate random variables from a
                                uniform  (0,1) distribution. Determine  the jackknife estimate of the
                                standard error for  , and calculate the bootstrap estimate of the stan-
                                                 x
                                dard error for the same statistic. Let’s say we decide to use  s ⁄  n   as
                                an estimate of the standard error for  . How does this compare to
                                                                   x
                                the other estimates?
                             7.6. Use Monte Carlo simulation to compare the performance of the boot-
                                strap and the jackknife methods for estimating the standard error and
                                bias of the sample second central moment. For every Monte Carlo
                                trial, generate 100 standard normal random variables and calculate
                                the bootstrap and jackknife estimates of the standard error and bias.
                                Show the distribution of the bootstrap estimates (of bias and standard
                                error) and the jackknife estimates (of bias and standard error) in a
                                histogram or a box plot. Make some comparisons of the two methods.
                             7.7. Repeat problem 7.4 and use Monte Carlo simulation to compare the
                                bootstrap and jackknife estimates of bias for the sample coefficient of


                            © 2002 by Chapman & Hall/CRC
   263   264   265   266   267   268   269   270   271   272   273