Page 41 - Intermediate Statistics for Dummies
P. 41
05_045206 ch01.qxd 2/1/07 9:41 AM Page 20
20
Part I: Data Analysis and Model-Building Basics
up with a good guess (estimate) of the population parameter, if you play your
cards right. A subset of this population is called a sample. A sample statistic is
a single number that summarizes that subset of the population.
For example, in the cell-phone scenario, you select a sample of teenagers and
measure the length of their cell-phone calls over a period of time (or look at
their cell-phone records if you can gain access legally). You take the average
of the cell-phone call lengths. For example, the average length of 100 cell-
phone calls may be 12.2 minutes — this average is a statistic. This particular
statistic is called the sample mean, because it’s the average value from your
sample data.
You can also find a statistic called the sample proportion (the proportion of
individuals in the sample that have a certain characteristic — for example, the
percentage of female teens who use cell phones). Many different statistics are
available (which you probably picked up in intro stats) to study different char-
acteristics of a sample, such as the median, variance, and standard deviation.
Confidence interval
A confidence interval is a range of values that provides reasonable estimates
for a population parameter. A confidence interval is based on a sample and
the statistics that come from that sample. The main reason you want to pro-
vide a range of possible values rather than a single number is that sample
results vary from sample to sample.
For example, say you want to estimate the percentage of people who eat
chocolate. According to the Simmons Research Bureau, 78 percent of adults
reported eating chocolate, and of those, 18 percent admitted to eating sweets
frequently. What’s missing in these results? These numbers are only a single
sample of people, and those sample results are guaranteed to vary from
sample to sample. You need some measure of how much you can expect
those results to move if you were to repeat the study.
This expected movement in your statistic is measured by the margin of error,
which reflects a certain number of standard deviations of your statistic you
add and subtract to have a certain confidence in your results (see Chapter 3
for more on margin of error). If the chocolate-eater results were based on
1,000 people, the margin of error would be approximately 3 percent, meaning
the actual percentage of people who eat chocolate in the entire population is
expected to be 78 percent, plus or minus 3 percent. In other words, it’s some-
where between 75 percent and 81 percent. Now if you only base these results
on a sample of 100 people, the margin of error balloons to 10 percent, mean-
ing the percentage of chocolate eaters can only be reported to be between 68
and 88 percent. Notice how much wider the interval becomes when a smaller
sample size is used. This result confirms that more data means more preci-
sion in your results (provided the data is collected properly).