Page 108 - Applied Statistics Using SPSS, STATISTICA, MATLAB and R

P. 108

3.2 Estimating a Mean 87

There are several values of n in the literature that are considered “large”,
from 20 to 30. In what concerns the normality assumption of X , the value n = 20 is
usually enough. As to the deviation between z 1−α/2 and t 1−α/2 it is about 5% for
n = 25 and α = 0.05. In the sequel, we will use the threshold n = 25 to distinguish
small samples from large samples. Therefore, when estimating a mean we adopt
the following procedure:

1. Large sample (n ≥ 25): Use formulas 3.9 (substituting σ by s) or 3.12 (if
improved accuracy is needed). No normality assumption of X is needed.
2. Small sample (n < 25) and population distribution can be assumed to be
normal: Use formula 3.12.

For simplicity most of the software products use formula 3.12 irrespective of the
values of n (for small n the normality assumption has to be checked using the
goodness of fit tests described in section 5.1).

Example 3.1
Q: Consider the data relative to the variable PRT for the first class (CLASS=1) of
the Cork Stoppers’ dataset. Compute the 95% confidence interval of its
mean.

A: There are n = 50 cases. The sample mean and sample standard deviation are
x = 365 and s = 110, respectively. The standard error is SE = s / n = 15.6. We
apply formula 3.12, obtaining the confidence interval:

x ± t 49 . 0 , 975 ×SE = x ± 2.01×15.6 = 365 ± 31.

Notice that this confidence interval corresponds to a tolerance of 31/365 ≈ 8%.
If we used in this large sample situation the normal approximation formula 3.9 we
would obtain a very close result.
Given the interpretation of confidence interval (sections 3.1 and 1.5) we expect
that in a large number of repetitions of 50 PRT measurements, in the same
conditions used for the presented dataset, the respective confidence intervals such
as the one we have derived will cover the true PRT mean 95% of the times. In
other words, when presenting [334, 396] as a confidence interval for the PRT
mean, we are incurring only on a 5% risk of being wrong by basing our estimate on
an atypical dataset.

Example 3.2
Q: Consider the subset of the previous PRT data constituted by the first n = 20
cases. Compute the 95% confidence interval of its mean.
A: The sample mean and sample standard deviation are now x = 351 and s = 83,
respectively. The standard error is SE = s / n = 18.56. Since n = 20, we apply the
small sample estimate formula 3.12 assuming that the PRT distribution can be well

103 104 105 106 107 108 109 110 111 112 113