Page 114 - Applied Statistics Using SPSS, STATISTICA, MATLAB and R
P. 114

3.3 Estimating a Proportion   93


           estimation for  p. Remember that the  sampling  distribution  of the  number  of
           “successes” is the binomial distribution (see B.1.5). Given the discreteness of the
           binomial distribution, it may be impossible to find an interval which has exactly
           the desired confidence level. It is possible, however, to choose an interval which
           covers p with probability at least 1– α.

           Table 3.2. Cumulative binomial probabilities for n = 15, p = 0.33.

           k       0     1    2     3     4    5     6     7    8     9    10
           B(k)   0.002 0.021 0.083  0.217 0.415 0.629 0.805 0.916 0.971 0.992 0.998


              Consider the cumulative binomial probabilities for n = 15, p = 0.33, as shown in
           Table 3.2. Using the  values of this table, we can  compute the following
           probabilities for intervals centred at k = 5:

              P(4 ≤ k ≤ 6) = B(6) – B(3) = 0.59
              P(3 ≤ k ≤ 7) = B(7) – B(2) = 0.83
              P(2 ≤ k ≤ 8) = B(8) – B(1) = 0.95
              P(1 ≤ k ≤ 9) = B(9) – B(0) = 0.99

              Therefore, a 95% confidence interval corresponds to:

                           2       8
              2 ≤ k ≤ 8    ⇒  ≤ p  ≤  ⇒    . 0  13 ≤ p  ≤  . 0  53.
                           15     15
              This is too large an interval to be useful. This example shows the inherent high
           degree of uncertainty when performing an interval estimation of a proportion with
           small n. For large n (say n > 50), we use the normal approximation to the binomial
           distribution as described in section A.7.3. Therefore, the sampling distribution of
            p ˆ is modelled as N µ,σ  with:
                           pq
              µ =  p;  σ =     (q =  p – 1; see A.7.3).                    3.14
                           n
              Thus, the large sample confidence interval of a proportion is:

              p ˆ −  z 1 α  2 /  pq  n /  <  p <  p ˆ +  z 1 α  2 /  pq  n /  .  3.15
                                       −
                   −

              This is the formula already alluded to  in Chapter 1, when describing the
           “uncertainties” about the estimation of a  proportion. Note that when applying
           formula 3.15, one  usually substitutes the true standard deviation by its point
           estimate, i.e., computing:

              p ˆ −  z 1 α  2 /  q p  n / ˆ ˆ  <  p <  p ˆ +  z 1 α  2 /  q p  n / ˆ ˆ  .  3.16
                                       −
                   −
   109   110   111   112   113   114   115   116   117   118   119