Page 217 - Computational Statistics Handbook with MATLAB
P. 217
204 Computational Statistics Handbook with MATLAB
6.3 Monte Carlo Methods for Inferential Statistics
The sampling distribution is known for many statistics. However, these are
typically derived using assumptions about the underlying population under
study or for large sample sizes. In many cases, we do not know the sampling
distribution for the statistic, or we cannot be sure that the assumptions are
satisfied. We can address these cases using Monte Carlo simulation methods,
which is the topic of this section. Some of the uses of Monte Carlo simulation
for inferential statistics are the following:
• Performing inference when the distribution of the test statistic is
not known analytically,
• Assessing the performance of inferential methods when parametric
assumptions are violated,
• Testing the null and alternative hypotheses under various condi-
tions,
• Evaluating the performance (e.g., power) of inferential methods,
• Comparing the quality of estimators.
In this section, we cover situations in inferential statistics where we do
know something about the distribution of the population our sample came
from or we are willing to make assumptions about the distribution. In Section
6.4, we discuss bootstrap methods that can be used when no assumptions are
made about the underlying distribution of the population.
o
o
ll
Ba
onte
onte
onteCar
M
BasicsicM
ree
oProcedProced
Car
BBaasicsic MM onteCarCar l loProcedProcedu ur uurr ee
The fundamental idea behind Monte Carlo simulation for inferential statis-
tics is that insights regarding the characteristics of a statistic can be gained by
repeatedly drawing random samples from the same population of interest
and observing the behavior of the statistic over the samples. In other words,
we estimate the distribution of the statistic by randomly sampling from the
population and recording the value of the statistic for each sample. The
observed values of the statistic for these samples are used to estimate the dis-
tribution.
The first step is to decide on a pseudo-population that the analyst assumes
represents the real population in all relevant aspects. We use the word pseudo
here to emphasize the fact that we obtain our samples using a computer and
pseudo random numbers. For example, we might assume that the underly-
ing population is exponentially distributed if the random variable represents
the time before a part fails, or we could assume the random variable comes
from a normal distribution if we are measuring IQ scores. The pseudo-popu-
© 2002 by Chapman & Hall/CRC