Page 191 - Statistics for Dummies
P. 191

175
                                         Chapter 11: Sampling Distributions and the Central Limit Theorem



    Don’t forget to divide by the square root of n in the denominator of z. Always
 divide by square root of n when the question refers to the average of the   The Sampling Distribution
                          of the Sample Proportion
 x- values.
 Revisiting the clerical worker example from the previous section “Sample size
 and standard error,” suppose X is the time it takes a randomly chosen cleri-  The Central Limit Theorem (CLT) doesn’t apply only to sample means for
 cal worker to type and send a standard letter of recommendation. Suppose X   numerical data. You can also use it with other statistics, including sample
 has a normal distribution, and assume the mean is 10.5 minutes and the stan-  proportions for categorical data (see Chapter 6). The population proportion,
 dard deviation 3 minutes. You take a random sample of 50 clerical workers   p, is the proportion of individuals in the population who have a certain char-
 and measure their times. What is the chance that their average time is less   acteristic of interest (for example, the proportion of all Americans who are
 than 9.5 minutes?                  registered voters, or the proportion of all teenagers who own cellphones).
                                    The sample proportion, denoted    (pronounced p-hat), is the proportion of
 This question translates to finding   . As X has a normal distribution   individuals in the sample who have that particular characteristic; in other
 to start with, you know   also has an exact (not approximate) normal distri-  words, the number of individuals in the sample who have that characteristic
 bution. Converting to z, you get:  of interest divided by the total sample size (n).

                                    For example, if you take a sample of 100 teens and find 60 of them own cell-
                                    phones, the sample proportion of cellphone-owning teens is      .
                                    This section examines the sampling distribution of all possible sample pro-
 So you want P(Z < –2.36), which equals 0.0091 (from the Z-table in the appen-  portions,  , from samples of size n from a population.
 dix). So the chance that a random sample of 50 clerical workers average less
 than 9.5 minutes to complete this task is 0.91% (very small).  The sampling distribution of   has the following properties:

 How do you find probabilities for   if X is not normal, or unknown? As a   	  ✓	Its mean, denoted by    (pronounced mu sub-p-hat), equals the popula-
 result of the CLT , the distribution of X can be non-normal or even unknown   tion proportion, p.
 and as long as n is large enough, you can still find approximate probabilities   ✓	Its standard error, denoted by    (say sigma sub-p-hat), equals:
 for   using the standard normal (Z-)distribution and the process described
 earlier. That is, convert to a z-value and find approximate probabilities using
 the Z-table (in the appendix).

                                         (Note that because n is in the denominator, the standard error
   When you use the CLT to find a probability for   (that is, when the distribu-
 tion of X is not normal or is unknown), be sure to say that your answer is an   decreases as n increases.)
 approximation. You also want to say the approximate answer should be close   	  ✓	Due to the CLT, its shape is approximately normal, provided that the
 because you’ve got a large enough n to use the CLT. (If n is not large enough   sample size is large enough. Therefore you can use the normal distribu-
 for the CLT, you can use the t-distribution in many cases — see Chapter 10.)  tion to find approximate probabilities for  .
                        	             ✓	The larger the sample size (n), the closer the distribution of the sample
    Beyond actual calculations, probabilities about   can help you decide   proportion is to a normal distribution.
 whether an assumption or a claim about a population mean is on target, based
 on your data. In the clerical workers example, it was assumed that the average      If you are interested in the number (rather than the proportion) of individuals
 time for all workers to type up a recommendation letter was 10.5 minutes.   in your sample with the characteristic of interest, you use the binomial distri-
 Your sample averaged 9.5 minutes. Because the probability that they would   bution to find probabilities for your results (see Chapter 8).
 average less than 9.5 minutes was found to be tiny (0.0091), you either got an
 unusually high number of fast workers in your sample just by chance, or the     How large is large enough for the CLT to work for sample proportions? Most
 assumption that the average time for all workers is 10.5 minutes was simply   statisticians agree that both np and n(1 – p) should be greater than or equal to
 too high. (I’m betting on the latter.) The process of checking assumptions or   10. That is, the average number of successes (np) and the average number of
 challenging claims about a population is called hypothesis testing; details are   failures n(1 – p) needs to be at least 10.
 in Chapter 14.









              17_9780470911082-ch11.indd   175                                                             3/25/11   10:01 PM
   186   187   188   189   190   191   192   193   194   195   196