Page 96 - Statistics II for Dummies
P. 96

80       Part II: Using Different Types of Regression to Make Predictions



                                If the collected data was the result of a well-designed experiment that controls
                                for possible confounding variables, you can establish a cause-and-effect rela-
                                tionship between x and y if they’re strongly correlated. Otherwise, you can’t
                                establish such a relationship. (See your Stats I text or Statistics For Dummies
                                for info regarding experiments.)


                                Extrapolation: The ultimate no-no

                                Plugging values of x into the model that fall outside of the reasonable
                                boundaries of x is called extrapolation. And one of my colleagues sums up
                                this idea very well: “Friends don’t let friends extrapolate.”

                                When you determine a best-fitting line for your data, you come up with an
                                equation that allows you to plug in a value for x and get a predicted value for
                                y. In algebra, if you find the equation of a line and graph it, the line typically
                                has an arrow on each end indicating it goes on forever in either direction. But
                                that doesn’t work for statistical problems (because statistics represents the
                                real world). When you’re dealing with real-world units like height, weight, IQ,
                                GPA, house prices, and the weight of your statistics textbook, only certain
                                numbers make sense.
                                So the first point is, don’t plug in values for x that don’t make any sense.
                                For example, if you’re estimating the price of a house (y) by using its square
                                footage (x), you wouldn’t think of plugging in a value of x like 10 square feet
                                or 100 square feet, because houses simply aren’t that small.

                                You also wouldn’t think about plugging in values like 1,000,000 square feet
                                for x (unless your “house” is the Ohio State football stadium or something).
                                It wouldn’t make sense. Likewise, if you’re estimating tomorrow’s tempera-
                                ture using today’s temperature, negative numbers for x could possibly make
                                sense, but if you’re estimating the amount of precipitation tomorrow given
                                the amount of precipitation today, negative numbers for x (or y for that
                                matter) don’t make sense.
                                Choose only reasonable values of x for which you try to make estimates
                                about y — that is, look at the values of x for which your data was collected,
                                and stay within those bounds when making predictions. In the textbook-
                                weight example, the smallest average student weight is 48.5 pounds, and
                                the largest average student weight is 142 pounds. Choosing student weights
                                between 48.5 and 142 to plug in for x in the equation is okay, but choosing
                                values less than 48.5 or more than 142 isn’t a good idea. You can’t guarantee
                                that the same linear relationship (or any linear relationship for that matter)
                                continues outside the given boundaries.













          09_466469-ch04.indd   80                                                                   7/24/09   10:20:40 AM
   91   92   93   94   95   96   97   98   99   100   101