Page 33 - Intermediate Statistics for Dummies
P. 33
05_045206 ch01.qxd 2/1/07 9:41 AM Page 12
12
Part I: Data Analysis and Model-Building Basics
Rule #1: Look Before You Crunch
Many people don’t realize that statistical software can’t tell you when to use
and not to use a certain statistical technique. You have to determine that on
your own. As a result, people think they’re doing their analyses correctly, but
they can end up making all kinds of mistakes. Statistical software packages
are centered on mathematical formulas, and mathematical formulas aren’t
smart enough to know how you’re applying them or to warn you when you’re
doing something wrong (that’s where this book comes in).
In this section, I give some examples of some of the major situations where
innocent data analyses can go wrong and why it’s important to know what’s
happening behind the scenes from a statistical standpoint before you start
crunching numbers.
Nothing (even a straight line)
lasts forever
After you get a statistical equation, or model, that tries to explain or predict
some random phenomena, you need to specify for what values the equation
applies and for what values the equation doesn’t apply. Equations don’t know
when they work and when they don’t; it’s up to the data analyst to determine
that. This idea is the same for applying the results of any data analysis that
you do.
Bill Prediction is a statistics student, studying the affect of study time on
exam score. Based on his experience, and that of a few friends, Bill comes up
with the equation y = 10x + 30, where y represents the test score you get if
you study a certain number of hours (x). This equation is Bill’s model for pre-
dicting exam score using study time. Notice that this model is the equation of
a straight line with a y-intercept of 30 and a slope of 10.
So Bill predicts, using this model, that if you don’t study at all, you’ll get a 30
on the exam (plugging x = 0 into the equation and solving for y; this point rep-
resents the y-intercept of the line). And he predicts, using this model, that if
you study for five hours, you’ll get an exam score of y = 10 5 + 30 = 80. So,
*
the point (5, 80) is also on this line. (I won’t talk in detail at this point about
how well Bill’s model does at predicting exam score, but you can just say he’s
got some work to do on this and leave it at that for now.)
I’m sure you would agree that because x is the amount of study time, that x
can never be a number less than zero. If you plug a negative number in for x,
say x = –10, you get y = 10 –10 + 30 = –70, which makes no sense. The worst
*
possible score, according to Bill’s model, is 30, which occurs when x equals 0.