Page 27 - Statistics II for Dummies
P. 27

Chapter 1: Beyond Number Crunching: The Art and Science of Data Analysis    11


                                Nothing (not even a straight

                                line) lasts forever

                                Bill Prediction is a statistics student studying the effect of study time on
                                exam score. Bill collects data on statistics students and uses his trusty
                                software package to predict exam score using study time. His computer
                                comes up with the equation y = 10x + 30, where y represents the test score
                                you get if you study a certain number of hours (x). Notice that this model is
                                the equation of a straight line with a y-intercept of 30 and a slope of 10.

                                So Bill predicts, using this model, that if you don’t study at all, you’ll get a
                                30 on the exam (plugging x = 0 into the equation and solving for y; this point
                                represents the y-intercept of the line). And he predicts, using this model, that
                                if you study for 5 hours, you’ll get an exam score of y = (10 * 5) + 30 = 80. So,
                                the point (5, 80) is also on this line.

                                But then Bill goes a little crazy and wonders what would happen if you
                                studied for 40 hours (since it always seems that long when he’s studying).
                                The computer tells him that if he studies for 40 hours, his test score is
                                predicted to be (10 * 40) + 30 = 430 points. Wow, that’s a lot of points!
                                Problem is, the exam only goes up to a total of 100 points. Bill wonders
                                where his computer went wrong.

                                But Bill puts the blame in the wrong place. He needs to remember that there are
                                limits on the values of x that make sense in this equation. For example, because
                                x is the amount of study time, x can never be a number less than zero. If you
                                plug a negative number in for x, say x = –10, you get y = (10 * –10) + 30 = –70,
                                which makes no sense. However, the equation itself doesn’t know that, nor
                                does the computer that found it. The computer simply graphs the line you
                                give it, assuming it’ll go on forever in both the positive and negative directions.

                                After you get a statistical equation or model, you need to specify for what
                                values the equation applies. Equations don’t know when they work and when
                                they don’t; it’s up to the data analyst to determine that. This idea is the same
                                for applying the results of any data analysis that you do.


                                Data snooping isn’t cool


                                Statisticians have come up with a saying that you may have heard: “Figures
                                don’t lie. Liars figure.” Make sure that you find out about all the analyses that
                                were performed on a data set, not just the ones reported as being statistically
                                significant.












          05_466469-ch01.indd   11                                                                    7/24/09   9:30:46 AM
   22   23   24   25   26   27   28   29   30   31   32