Page 33 - Intermediate Statistics for Dummies
P. 33

05_045206 ch01.qxd  2/1/07  9:41 AM  Page 12
                                12
                                         Part I: Data Analysis and Model-Building Basics
                                         Rule #1: Look Before You Crunch
                                                    Many people don’t realize that statistical software can’t tell you when to use
                                                    and not to use a certain statistical technique. You have to determine that on
                                                    your own. As a result, people think they’re doing their analyses correctly, but
                                                    they can end up making all kinds of mistakes. Statistical software packages
                                                    are centered on mathematical formulas, and mathematical formulas aren’t
                                                    smart enough to know how you’re applying them or to warn you when you’re
                                                    doing something wrong (that’s where this book comes in).
                                                    In this section, I give some examples of some of the major situations where
                                                    innocent data analyses can go wrong and why it’s important to know what’s
                                                    happening behind the scenes from a statistical standpoint before you start
                                                    crunching numbers.
                                                    Nothing (even a straight line)
                                                    lasts forever
                                                    After you get a statistical equation, or model, that tries to explain or predict
                                                    some random phenomena, you need to specify for what values the equation
                                                    applies and for what values the equation doesn’t apply. Equations don’t know
                                                    when they work and when they don’t; it’s up to the data analyst to determine
                                                    that. This idea is the same for applying the results of any data analysis that
                                                    you do.
                                                    Bill Prediction is a statistics student, studying the affect of study time on
                                                    exam score. Based on his experience, and that of a few friends, Bill comes up
                                                    with the equation y = 10x + 30, where y represents the test score you get if
                                                    you study a certain number of hours (x). This equation is Bill’s model for pre-
                                                    dicting exam score using study time. Notice that this model is the equation of
                                                    a straight line with a y-intercept of 30 and a slope of 10.
                                                    So Bill predicts, using this model, that if you don’t study at all, you’ll get a 30
                                                    on the exam (plugging x = 0 into the equation and solving for y; this point rep-
                                                    resents the y-intercept of the line). And he predicts, using this model, that if
                                                    you study for five hours, you’ll get an exam score of y = 10  5 + 30 = 80. So,
                                                                                                      *
                                                    the point (5, 80) is also on this line. (I won’t talk in detail at this point about
                                                    how well Bill’s model does at predicting exam score, but you can just say he’s
                                                    got some work to do on this and leave it at that for now.)
                                                    I’m sure you would agree that because x is the amount of study time, that x
                                                    can never be a number less than zero. If you plug a negative number in for x,
                                                    say x = –10, you get y = 10  –10 + 30 = –70, which makes no sense. The worst
                                                                          *
                                                    possible score, according to Bill’s model, is 30, which occurs when x equals 0.
   28   29   30   31   32   33   34   35   36   37   38