Page 191 - Statistics for Dummies
P. 191
175
Chapter 11: Sampling Distributions and the Central Limit Theorem
Don’t forget to divide by the square root of n in the denominator of z. Always
divide by square root of n when the question refers to the average of the The Sampling Distribution
of the Sample Proportion
x- values.
Revisiting the clerical worker example from the previous section “Sample size
and standard error,” suppose X is the time it takes a randomly chosen cleri- The Central Limit Theorem (CLT) doesn’t apply only to sample means for
cal worker to type and send a standard letter of recommendation. Suppose X numerical data. You can also use it with other statistics, including sample
has a normal distribution, and assume the mean is 10.5 minutes and the stan- proportions for categorical data (see Chapter 6). The population proportion,
dard deviation 3 minutes. You take a random sample of 50 clerical workers p, is the proportion of individuals in the population who have a certain char-
and measure their times. What is the chance that their average time is less acteristic of interest (for example, the proportion of all Americans who are
than 9.5 minutes? registered voters, or the proportion of all teenagers who own cellphones).
The sample proportion, denoted (pronounced p-hat), is the proportion of
This question translates to finding . As X has a normal distribution individuals in the sample who have that particular characteristic; in other
to start with, you know also has an exact (not approximate) normal distri- words, the number of individuals in the sample who have that characteristic
bution. Converting to z, you get: of interest divided by the total sample size (n).
For example, if you take a sample of 100 teens and find 60 of them own cell-
phones, the sample proportion of cellphone-owning teens is .
This section examines the sampling distribution of all possible sample pro-
So you want P(Z < –2.36), which equals 0.0091 (from the Z-table in the appen- portions, , from samples of size n from a population.
dix). So the chance that a random sample of 50 clerical workers average less
than 9.5 minutes to complete this task is 0.91% (very small). The sampling distribution of has the following properties:
How do you find probabilities for if X is not normal, or unknown? As a ✓ Its mean, denoted by (pronounced mu sub-p-hat), equals the popula-
result of the CLT , the distribution of X can be non-normal or even unknown tion proportion, p.
and as long as n is large enough, you can still find approximate probabilities ✓ Its standard error, denoted by (say sigma sub-p-hat), equals:
for using the standard normal (Z-)distribution and the process described
earlier. That is, convert to a z-value and find approximate probabilities using
the Z-table (in the appendix).
(Note that because n is in the denominator, the standard error
When you use the CLT to find a probability for (that is, when the distribu-
tion of X is not normal or is unknown), be sure to say that your answer is an decreases as n increases.)
approximation. You also want to say the approximate answer should be close ✓ Due to the CLT, its shape is approximately normal, provided that the
because you’ve got a large enough n to use the CLT. (If n is not large enough sample size is large enough. Therefore you can use the normal distribu-
for the CLT, you can use the t-distribution in many cases — see Chapter 10.) tion to find approximate probabilities for .
✓ The larger the sample size (n), the closer the distribution of the sample
Beyond actual calculations, probabilities about can help you decide proportion is to a normal distribution.
whether an assumption or a claim about a population mean is on target, based
on your data. In the clerical workers example, it was assumed that the average If you are interested in the number (rather than the proportion) of individuals
time for all workers to type up a recommendation letter was 10.5 minutes. in your sample with the characteristic of interest, you use the binomial distri-
Your sample averaged 9.5 minutes. Because the probability that they would bution to find probabilities for your results (see Chapter 8).
average less than 9.5 minutes was found to be tiny (0.0091), you either got an
unusually high number of fast workers in your sample just by chance, or the How large is large enough for the CLT to work for sample proportions? Most
assumption that the average time for all workers is 10.5 minutes was simply statisticians agree that both np and n(1 – p) should be greater than or equal to
too high. (I’m betting on the latter.) The process of checking assumptions or 10. That is, the average number of successes (np) and the average number of
challenging claims about a population is called hypothesis testing; details are failures n(1 – p) needs to be at least 10.
in Chapter 14.
17_9780470911082-ch11.indd 175 3/25/11 10:01 PM