Page 28 - Statistics II for Dummies
P. 28
12 Part I: Tackling Data Analysis and Model-Building Basics
Suppose Bill Prediction (from the previous section) decides to try to pre-
dict scores on a biology exam based on study time, but this time his model
doesn’t fit. Not one to give in, Bill insists there must be some other factors
that predict biology exam scores besides study time, and he sets out to find
them.
Bill measures everything from soup to nuts. His set of 20 possible variables
includes study time, GPA, previous experience in statistics, math grades in
high school, and whether you chew gum during the exam. After his multitude
of various correlation analyses, the variables that Bill found to be related
to exam score were study time, math grades in high school, GPA, and gum
chewing during the exam. It turns out that this particular model fits pretty
well (by criteria I discuss in Chapter 5 on multiple linear regression models).
But here’s the problem: By looking at all possible correlations between his 20
variables and exam score, Bill is actually doing 20 separate statistical analy-
ses. Under typical conditions that I describe in Chapter 3, each statistical
analysis has a 5 percent chance of being wrong just by chance. I bet you can
guess which one of Bill’s correlations likely came out wrong in this case. And
hopefully he didn’t rely on a stick of gum to boost his grade in biology.
Looking at data until you find something in it is called data snooping. Data
snooping results in giving the researcher his five minutes of fame but then
leads him to lose all credibility because no one can repeat his results.
No (data) fishing allowed
Some folks just don’t take no for an answer, and when it comes to analyzing
data, that can lead to trouble.
Sue Gonnafindit is a determined researcher. She believes that her horse can
count by stomping his foot. (For example, she says “2” and her horse stomps
twice.) Sue collects data on her horse for four weeks, recording the percent-
age of time the horse gets the counting right. She runs the appropriate sta-
tistical analysis on her data and is shocked to find no significant difference
between her horse’s results and those you would get simply by guessing.
Determined to prove her results are real, Sue looks for other types of analy-
ses that exist and plugs her data into anything and everything she can find
(never mind that those analyses are inappropriate to use in her situation).
Using the famous hunt-and-peck method, at some point she eventually stum-
bles upon a significant result. However, the result is bogus because she tried
so many analyses that weren’t appropriate and ignored the results of the
appropriate analysis because it didn’t tell her what she wanted to hear.
05_466469-ch01.indd 12 7/24/09 9:30:47 AM