Page 115 - Applied Statistics Using SPSS, STATISTICA, MATLAB and R
P. 115

94       3 Estimating Data Parameters


              The deviation of this formula from the exact formula is negligible for large n
           (see e.g. Spiegel MR, Schiller J, Srinivasan RA, 2000, for details).
              One can also assume a worst case situation for σ, corresponding to p = q = ½
           ⇒ σ  =  2 (  ) n  − 1  . The approximate 95% confidence level is now easy to remember:

                ±
               ˆ
              p 1 /  n .

              Also, note that if we decrease the tolerance while maintaining n, the confidence
           level decreases as already mentioned in Chapter 1 and shown in Figure 1.6.

           Example 3.5

           Q: Consider,  for the  Fres hmen   dataset, the estimation of the  proportion  of
           freshmen that are displaced from their home (variable DISPL). Compute the 95%
           confidence interval of this proportion.

           A: There are n = 132 cases, 37 of which are displaced, i.e.,  p ˆ = 0.28. Applying
           formula 3.15, we have:

              p ˆ − 1.96 p / ˆˆ q  n  < p < p ˆ  + 1.96 p / ˆˆ q  n     ⇒   0.20 < p  < 0.36.
                                                           
                               

              Note that this confidence interval is quite large. The  following example will
           give some hint as to  when we start  obtaining reasonably useful confidence
           intervals.


           Example 3.6
           Q: Consider the interval estimation of a proportion in the same conditions as the
           previous example, i.e., with estimated proportion p ˆ = 0.28 and α = 5%. How large
           should the sample size be for the confidence interval endpoints deviating less than
           ε = 2%?

           A: In general, we must apply the following condition:
              z 1−α  2 /  p ˆ ˆq  ≤ ε  ⇒ n ≥    z 1−α  2 /  p q ˆ ˆ   2  .  3.17
                   n                   ε     

              In the present case, we must have n > 1628. As with the estimation of a mean, n
           grows with the square of 1/ε. As a matter of fact, assuming the worst case situation
           for σ, as we did above, the following approximate formula for 95% confidence
           level holds:  >n  ~  / 1 (  ) ε  2  .


              Confidence intervals for proportions, and lower bounds on n achieving a desired
           deviation in proportion estimation, can be computed with Tools.xls  .
              Interval estimation of a proportion can be carried out with SPSS, STATISTICA,
           MATLAB and R in the same way as we did with means. The only preliminary step
   110   111   112   113   114   115   116   117   118   119   120