Page 106 - Intermediate Statistics for Dummies
P. 106

09_045206 ch04.qxd  2/1/07  9:49 AM  Page 85
                                                                Chapter 4: Getting in Line with Simple Linear Regression
                                                    In the textbook weight example, you estimate the average weight of the stu-
                                                    dents’ textbooks by using the students’ average weight, but that doesn’t
                                                    mean that increasing a particular child’s weight causes his textbook weight
                                                    to increase. For example, because of the strong positive correlation, you do
                                                    know that students with lower weights are associated with lower total text-
                                                    book weights, and students with higher weights tend to have higher textbook
                                                    weights. But you can’t take one particular third-grade student, increase his
                                                    weight, and presto — suddenly his textbooks weigh more.
                                                    The variable that is underlying the relationship between a child’s weight and
                                                    the weight of his backpack is the grade level of the student; as grade level
                                                    increases, so does the size of his books. Student grade level drives both stu-
                                                    dent weight and textbook weight. In this situation, student grade level is what
                                                    statisticians call a confounding variable: it’s a variable that wasn’t included in
                                                    the study but is related to both the outcome and the response, and the vari-
                                                    able confounds or confuses the issue of what is causing what to happen.
                                                    If the collected data was the result of a well-designed experiment that con-  85
                                                    trols for possible confounding variables, you can establish a cause-and-effect
                                                    relationship between x and y if they’re strongly correlated. Otherwise, you
                                                    can’t.
                                                    Extrapolation: The ultimate no-no
                                                    Plugging values of x into the model that fall outside of the reasonable bound-
                                                    aries of x is called extrapolation. And one of my colleagues sums up this idea
                                                    very well, “Friends don’t let friends extrapolate.”
                                                    When you determine a best-fitting line for your data, you come up with an
                                                    equation that allows you to plug in a value for x and get a predicted value
                                                    for y. In algebra, if you found the equation of a line and graphed it, the line
                                                    would typically have an arrow on each end indicating it goes on forever in
                                                    either direction. But that doesn’t work for statistical problems (’cause statis-
                                                    tics represents the real world). What I mean is that when you’re dealing with
                                                    real-world units like height, weight, IQ, GPA, house prices, and the weight of
                                                    your statistics textbook, only certain numbers make sense.
                                                    So the first point is, don’t plug in values for x that don’t make any sense. For
                                                    example, if you’re estimating the price of a house (y), using its square footage
                                                    (x), you wouldn’t think of plugging in a value of x like 10 square feet or 100
                                                    square feet, because houses simply aren’t that small. You also wouldn’t think
                                                    about plugging in values like 1,000,000 square feet for x (unless your “house”
                                                    is the Ohio State football stadium or the like). It wouldn’t make sense. If
                                                    you’re estimating tomorrow’s temperature using today’s temperature, nega-
                                                    tive numbers for x could possibly make sense, but if you’re estimating the
                                                    amount of precipitation tomorrow given the amount of precipitation today,
                                                    negative numbers for x (or y for that matter) don’t make sense.

                                                                             @Spy
   101   102   103   104   105   106   107   108   109   110   111