Page 27 - Statistics II for Dummies
P. 27
Chapter 1: Beyond Number Crunching: The Art and Science of Data Analysis 11
Nothing (not even a straight
line) lasts forever
Bill Prediction is a statistics student studying the effect of study time on
exam score. Bill collects data on statistics students and uses his trusty
software package to predict exam score using study time. His computer
comes up with the equation y = 10x + 30, where y represents the test score
you get if you study a certain number of hours (x). Notice that this model is
the equation of a straight line with a y-intercept of 30 and a slope of 10.
So Bill predicts, using this model, that if you don’t study at all, you’ll get a
30 on the exam (plugging x = 0 into the equation and solving for y; this point
represents the y-intercept of the line). And he predicts, using this model, that
if you study for 5 hours, you’ll get an exam score of y = (10 * 5) + 30 = 80. So,
the point (5, 80) is also on this line.
But then Bill goes a little crazy and wonders what would happen if you
studied for 40 hours (since it always seems that long when he’s studying).
The computer tells him that if he studies for 40 hours, his test score is
predicted to be (10 * 40) + 30 = 430 points. Wow, that’s a lot of points!
Problem is, the exam only goes up to a total of 100 points. Bill wonders
where his computer went wrong.
But Bill puts the blame in the wrong place. He needs to remember that there are
limits on the values of x that make sense in this equation. For example, because
x is the amount of study time, x can never be a number less than zero. If you
plug a negative number in for x, say x = –10, you get y = (10 * –10) + 30 = –70,
which makes no sense. However, the equation itself doesn’t know that, nor
does the computer that found it. The computer simply graphs the line you
give it, assuming it’ll go on forever in both the positive and negative directions.
After you get a statistical equation or model, you need to specify for what
values the equation applies. Equations don’t know when they work and when
they don’t; it’s up to the data analyst to determine that. This idea is the same
for applying the results of any data analysis that you do.
Data snooping isn’t cool
Statisticians have come up with a saying that you may have heard: “Figures
don’t lie. Liars figure.” Make sure that you find out about all the analyses that
were performed on a data set, not just the ones reported as being statistically
significant.
05_466469-ch01.indd 11 7/24/09 9:30:46 AM