Page 115 - Applied Statistics Using SPSS, STATISTICA, MATLAB and R

P. 115

94 3 Estimating Data Parameters

The deviation of this formula from the exact formula is negligible for large n
(see e.g. Spiegel MR, Schiller J, Srinivasan RA, 2000, for details).
One can also assume a worst case situation for σ, corresponding to p = q = ½
⇒ σ = 2 ( ) n − 1 . The approximate 95% confidence level is now easy to remember:

±
ˆ
p 1 / n .

Also, note that if we decrease the tolerance while maintaining n, the confidence
level decreases as already mentioned in Chapter 1 and shown in Figure 1.6.

Example 3.5

Q: Consider, for the Fres hmen dataset, the estimation of the proportion of
freshmen that are displaced from their home (variable DISPL). Compute the 95%
confidence interval of this proportion.

A: There are n = 132 cases, 37 of which are displaced, i.e., p ˆ = 0.28. Applying
formula 3.15, we have:

p ˆ − 1.96 p / ˆˆ q n < p < p ˆ + 1.96 p / ˆˆ q n ⇒ 0.20 < p < 0.36.



Note that this confidence interval is quite large. The following example will
give some hint as to when we start obtaining reasonably useful confidence
intervals.

Example 3.6
Q: Consider the interval estimation of a proportion in the same conditions as the
previous example, i.e., with estimated proportion p ˆ = 0.28 and α = 5%. How large
should the sample size be for the confidence interval endpoints deviating less than
ε = 2%?

A: In general, we must apply the following condition:
z 1−α 2 / p ˆ ˆq ≤ ε ⇒ n ≥   z 1−α 2 / p q ˆ ˆ   2 . 3.17
n   ε  

In the present case, we must have n > 1628. As with the estimation of a mean, n
grows with the square of 1/ε. As a matter of fact, assuming the worst case situation
for σ, as we did above, the following approximate formula for 95% confidence
level holds: >n ~ / 1 ( ) ε 2 .

Confidence intervals for proportions, and lower bounds on n achieving a desired
deviation in proportion estimation, can be computed with Tools.xls .
Interval estimation of a proportion can be carried out with SPSS, STATISTICA,
MATLAB and R in the same way as we did with means. The only preliminary step

110 111 112 113 114 115 116 117 118 119 120