Page 128 - Applied Statistics Using SPSS, STATISTICA, MATLAB and R
P. 128
Exercises 107
In order to obtain bootstrap distributions with R one must first install the boot
package with library(boot) . One can check if the package is installed with
the search() function (see section 1.7.2.2).
The boot function of the bo ot package will generate m bootstrap replicates of
a statistical function, denoted statistic , passed (its name) as argument.
However, this function should have as second argument a vector of indices,
frequencies or weights. In our applications we will use a vector of indices, which
corresponds to setting the stype argument to its default value, stype=“i”.
Since it is the default value we really don’t need to mention it when calling boot.
Anyway, the need to have the mentioned second argument obliges one to write the
code of the statistical function. Let us consider Example 3.10. Supposing the
clays data frame has been created and attached, it would be solved in R in the
following way:
> sdboot <- function(x,i)sd(x[i])
> b <- boot(CaO,sdboot,1000)
The first line defines the function sdbo ot with two arguments. The first
argument is the data. The second argument is the vector of indices which will be
used to store the index information of the bootstrap samples. The function itself
computes the standard deviation of those data elements whose indices are in the
index vector i (see the last paragraph of section 2.1.2.4).
The boot function returns a so-called bootstrap object, denoted above as b . By
listing b one may obtain:
Bootstrap Statistics :
original bias std. error
t1* 0.08601075 -0.00082119 0.007099508
which agrees fairly well with the values computed with MATLAB in Example
3.10. One of the attributes of the bootstrap object is the vector with the bootstrap
replicates, denoted t . The histogram of the bootstrap distribution can therefore be
obtained with:
> hist(b$t)
Exercises
3.1 Consider the 1−α 1 and 1−α 2 confidence intervals of a given statistic with 1−α 1 > 1−α 2 .
Why is the confidence interval for 1−α 1 always larger than or equal to the interval for
1−α 2 ?
3.2 Consider the measurements of bottle bottoms of the Moulds dataset. Determine the
95% confidence interval of the mean and the x-charts of the three variables RC, CG
and EG. Taking into account the x-chart, discuss whether the 95% confidence interval
of the RC mean can be considered a reliable estimate.