Page 42 - Intermediate Statistics for Dummies
P. 42
05_045206 ch01.qxd 2/1/07 9:41 AM Page 21
Chapter 1: Beyond Number Crunching: The Art and Science of Data Analysis
Hypothesis test
A hypothesis test is a statistical procedure that you use to test an existing
claim about the population, using your data. The claim is noted by Ho (the
null hypothesis). If your data support the claim, you fail to reject Ho. If your
data don’t support the claim, you reject Ho and conclude an alternative
hypothesis, Ha. The reason most people conduct a hypothesis test is not to
merely show that their data support an existing claim, but rather to show
that the existing claim is false, in favor of the alternative hypothesis.
The Pew Research Center studied the percentage of people who go to ESPN
for their sports news. Their statistics, based on a survey of about 1,000
people, found that in 2000, 23 percent of people said they go to ESPN; while in
2004, only 20 percent reported going to ESPN. The question is this: Does this
3-percent reduction in viewers from 2000 to 2004 represent a significant trend
that ESPN should worry about?
To test these differences formally, you can set up a hypothesis test. You set 21
up your null hypothesis as the result you have to believe without your study,
Ho = no difference exists between 2000 and 2004 data for ESPN viewership.
Your alternative hypothesis (Ha) is that a difference is there.
In very general terms, here’s what’s happening with a hypothesis test. You
have the sample data, and you find the statistics that are relevant. In this
case, you have two sample percentages, one for 2000 and one for 2004. You
take the difference between the two samples (3 percent), and divide it by the
standard error for the difference. The standard error measures how much the
difference in the statistics is expected to change from sample to sample. In
this case, the standard error comes to about 1.8 percent (for specific calcula-
tions see Chapter 3).
Taking the difference in the statistics (3 percent = 0.03) divided by the stan-
dard error (1.8 percent = 0.018) gives you the value of 1.67 (called the test
statistic). This value represents the difference between the two statistics, in
terms of number of standard errors. This result has a universal interpreta-
tion. Roughly speaking, if your test statistic falls between –2.00 and +2.00,
that means the results you found don’t differ enough to get excited about,
because 95 percent of the time, this outcome happens just by chance. (And
this example falls right into that situation.) After you take the variability of
the sample results into account, the difference in these particular samples
doesn’t transfer over to the populations they represent. So, because you
can’t reject Ho, you have to say the percentage of viewers of ESPN in the
entire population probably didn’t change from 2000 to 2004.
Because you have a 95 percent confidence level, this test uses a significance
level (α level) of 1 – 0.95 = 0.05 or 5 percent. This percentage measures how
likely your results would have been just by chance.