Page 41 - Intermediate Statistics for Dummies
P. 41

05_045206 ch01.qxd  2/1/07  9:41 AM  Page 20
                                20
                                         Part I: Data Analysis and Model-Building Basics
                                                    up with a good guess (estimate) of the population parameter, if you play your
                                                    cards right. A subset of this population is called a sample. A sample statistic is
                                                    a single number that summarizes that subset of the population.
                                                    For example, in the cell-phone scenario, you select a sample of teenagers and
                                                    measure the length of their cell-phone calls over a period of time (or look at
                                                    their cell-phone records if you can gain access legally). You take the average
                                                    of the cell-phone call lengths. For example, the average length of 100 cell-
                                                    phone calls may be 12.2 minutes — this average is a statistic. This particular
                                                    statistic is called the sample mean, because it’s the average value from your
                                                    sample data.
                                                    You can also find a statistic called the sample proportion (the proportion of
                                                    individuals in the sample that have a certain characteristic — for example, the
                                                    percentage of female teens who use cell phones). Many different statistics are
                                                    available (which you probably picked up in intro stats) to study different char-
                                                    acteristics of a sample, such as the median, variance, and standard deviation.
                                                    Confidence interval
                                                    A confidence interval is a range of values that provides reasonable estimates
                                                    for a population parameter. A confidence interval is based on a sample and
                                                    the statistics that come from that sample. The main reason you want to pro-
                                                    vide a range of possible values rather than a single number is that sample
                                                    results vary from sample to sample.
                                                    For example, say you want to estimate the percentage of people who eat
                                                    chocolate. According to the Simmons Research Bureau, 78 percent of adults
                                                    reported eating chocolate, and of those, 18 percent admitted to eating sweets
                                                    frequently. What’s missing in these results? These numbers are only a single
                                                    sample of people, and those sample results are guaranteed to vary from
                                                    sample to sample. You need some measure of how much you can expect
                                                    those results to move if you were to repeat the study.
                                                    This expected movement in your statistic is measured by the margin of error,
                                                    which reflects a certain number of standard deviations of your statistic you
                                                    add and subtract to have a certain confidence in your results (see Chapter 3
                                                    for more on margin of error). If the chocolate-eater results were based on
                                                    1,000 people, the margin of error would be approximately 3 percent, meaning
                                                    the actual percentage of people who eat chocolate in the entire population is
                                                    expected to be 78 percent, plus or minus 3 percent. In other words, it’s some-
                                                    where between 75 percent and 81 percent. Now if you only base these results
                                                    on a sample of 100 people, the margin of error balloons to 10 percent, mean-
                                                    ing the percentage of chocolate eaters can only be reported to be between 68
                                                    and 88 percent. Notice how much wider the interval becomes when a smaller
                                                    sample size is used. This result confirms that more data means more preci-
                                                    sion in your results (provided the data is collected properly).
   36   37   38   39   40   41   42   43   44   45   46