Page 194 - Applied Statistics Using SPSS, STATISTICA, MATLAB and R
P. 194
5.1 Inference on One Population 175
proportion of times. Let us denote the categories or classes of the population by ω,
coded 1 for the category of interest and 0 for the complement. The two-tailed test
can be then formalised as:
H 0: P(ω =1) = p ( and P(ω =0) = 1 − p = q );
H 1: P(ω =1) ≠ p ( and P(ω =0) ≠ q ).
Given a data sample with n i.i.d. cases, k of which correspond to ω =1, we know
from Chapter 3 (see also Appendix C) that the point estimate of p is p ˆ = k/n. In
order to establish the critical region of the test, we take into account that the
probability of obtaining k events of ω =1 in n trials is given by the binomial law.
Let K denote the random variable associated to the number of times that ω = 1
occurs in a sample of size n. We then have the binomial sampling distribution
(section A.7.1):
n
P( K = k) = p k q n− k ; k = , 1 , 0 K n , .
k
When n is small (say, below 25), the non-critical region is usually quite large
and the power of the test quite low. We have also found useless large confidence
intervals for small samples in section 3.3, when estimating a proportion. The test
yields useful results only for large samples (say, above 25). In this case (especially
when np or nq are larger than 25, see A.7.3), we use the normal approximation of
the standardised sampling distribution:
K − np
Z = ~ N 1 , 0 5.3
npq
Notice that denoting by P the random variable corresponding to the proportion
of successes in the sample (with observed value p ˆ = k/n), we may write 5.3 as:
K − np K/n − p P − p
Z = = = . 5.4
npq pq n / pq n /
The binomial test is then performed in the same manner as the test of a single
mean described in section 4.3.1. The approximation to the normal distribution
becomes better if a continuity correction is used, reducing by 0.5 the difference
between the observed mean ( pnˆ ) and the expected mean (np).
As shown in Commands 5.3, SPSS and R have a specific command for carrying
out the binomial test. SPSS uses the normal approximation with continuity
correction for n > 25. R uses a similar procedure. In order to perform the binomial
test with STATISTICA or MATLAB, one uses the single sample t test command.