Page 194 - Applied Statistics Using SPSS, STATISTICA, MATLAB and R
P. 194

5.1 Inference on One Population   175


           proportion of times. Let us denote the categories or classes of the population by ω,
           coded 1 for the category of interest and 0 for the complement. The two-tailed test
           can be then formalised as:

              H 0:   P(ω =1) = p     ( and P(ω =0) = 1 − p = q );
              H 1:   P(ω =1) ≠ p     ( and P(ω =0) ≠ q ).

              Given a data sample with n i.i.d. cases, k of which correspond to ω =1, we know
           from Chapter 3 (see also Appendix C) that the point estimate of p is  p ˆ = k/n. In
           order to establish the  critical region of the test, we take into account that the
           probability of obtaining k events of ω =1 in n trials is given by the binomial law.
           Let K denote the random variable associated to the number of times that ω = 1
           occurs in a sample of size  n.  We then have the binomial sampling distribution
           (section A.7.1):
                         n
              P( K = k) =      p  k q  n− k  ;  k =  , 1 , 0  K  n , .
                         k 

              When n is small (say, below 25), the non-critical region is usually quite large
           and the power of the test quite low. We have also found useless large confidence
           intervals for small samples in section 3.3, when estimating a proportion. The test
           yields useful results only for large samples (say, above 25). In this case (especially
           when np or nq are larger than 25, see A.7.3), we use the normal approximation of
           the standardised sampling distribution:

                  K −  np
              Z =        ~   N  1 , 0                                       5.3
                    npq

              Notice that denoting by P the random variable corresponding to the proportion
           of successes in the sample (with observed value  p ˆ  = k/n), we may write 5.3 as:

                  K −  np  K/n −  p  P −  p
              Z =       =        =       .                                  5.4
                    npq    pq  n /  pq  n /

              The binomial test is then performed in the same manner as the test of a single
           mean described in section  4.3.1. The approximation to the normal distribution
           becomes better if a continuity correction is used, reducing by 0.5 the difference
           between the observed mean ( pnˆ ) and the expected mean (np).
              As shown in Commands 5.3, SPSS and R have a specific command for carrying
           out the  binomial test. SPSS  uses the  normal approximation  with continuity
           correction for  n > 25. R uses a similar procedure. In order to perform the binomial
           test with STATISTICA or MATLAB, one uses the single sample t test command.
   189   190   191   192   193   194   195   196   197   198   199