Page 30 - Statistics II for Dummies
P. 30
14 Part I: Tackling Data Analysis and Model-Building Basics
Sample statistic
Typically you can’t determine population parameters exactly; you can only
estimate them. But all is not lost; by taking a sample (a subset of individuals)
from the population and studying it, you can come up with a good estimate
of the population parameter. A sample statistic is a single number that sum-
marizes that subset.
For example, in the cellphone scenario from the previous section, you select
a sample of teenagers and measure the duration of their cellphone calls over
a period of time (or look at their cellphone records if you can gain access
legally). You take the average of the cellphone call duration. For example, the
average duration of 100 cellphone calls may be 12.2 minutes — this average
is a statistic. This particular statistic is called the sample mean because it’s
the average value from your sample data.
Many different statistics are available to study different characteristics of a
sample, such as the proportion, the median, and standard deviation.
Confidence interval
A confidence interval is a range of likely values for a population parameter. A
confidence interval is based on a sample and the statistics that come from
that sample. The main reason you want to provide a range of likely values
rather than a single number is that sample results vary.
For example, suppose you want to estimate the percentage of people who eat
chocolate. According to the Simmons Research Bureau, 78 percent of adults
reported eating chocolate, and of those, 18 percent admitted eating sweets
frequently. What’s missing in these results? These numbers are only from
a single sample of people, and those sample results are guaranteed to vary
from sample to sample. You need some measure of how much you can expect
those results to move if you were to repeat the study.
This expected variation in your statistic from sample to sample is measured
by the margin of error, which reflects a certain number of standard deviations
of your statistic you add and subtract to have a certain confidence in your
results (see Chapter 3 for more on margin of error). If the chocolate-eater
results were based on 1,000 people, the margin of error would be approxi-
mately 3 percent. This means the actual percentage of people who eat choco-
late in the entire population is expected to be 78 percent, ± 3 percent (that is,
between 75 percent and 81 percent).
05_466469-ch01.indd 14 7/24/09 9:30:47 AM