Page 48 - Intermediate Statistics for Dummies
P. 48

05_045206 ch01.qxd  2/1/07  9:41 AM  Page 27
                                             Chapter 1: Beyond Number Crunching: The Art and Science of Data Analysis
                                                    Linear regression
                                                    After you’ve determined that two variables have a fairly strong linear rela-
                                                    tionship, you may want to try to make predictions for one variable based on
                                                    the value of the other variable. For example, if you know that a fairly strong
                                                    negative linear relationship exists between coffees sold and the air tempera-
                                                    ture at a football game, you may want to use this information to predict how
                                                    much coffee is needed for a game, just by knowing the temperature. This
                                                    method of finding the best-fitting line is called linear regression.
                                                    In the coffees and temperature example (see Figure 1-5), the best-fitting line
                                                    has the equation y = 49,337 – 554  x , where x is temperature and y is the
                                                                                *
                                                    number of coffees sold. So when the temperature (x) is zero degrees, you can
                                                    expect to sell around 49,337 coffees (this is how you interpret the y-intercept
                                                    of the line). To interpret the slope of this line, think of –554 as –554 divided
                                                    by one and use the old rise-over-run idea using coffees and degrees of tem-
                                                    perature. Applied here, it means that for every one degree increase in tem-
                                                    perature, you can expect the coffee sales to decrease by 554. You can use this  27
                                                    line to make predictions for reasonable values of the temperature (x). For
                                                    example, if the temperature is a cold 20-degrees Fahrenheit, you can predict
                                                    that the number of coffees sold will be around 49,337 – 554  20 = 38,257.
                                                                                                       *
                                                    When you use only one variable to predict the response, the method of
                                                    regression is called simple linear regression. (I review the basics of simple
                                                    linear regression in Chapter 4. But many other types of regression are out
                                                    there, many of which I discuss in this book.)
                                                    Most researchers use more than one variable to predict a response; this tech-
                                                    nique is called multiple linear regression. (Check out Chapter 5 for the details
                                                    about multiple linear regression.) Multiple linear regression has many issues
                                                    of its own because some variables you can use in the model may be related
                                                    to each other, making overlapping contributions to the response. That possi-
                                                    bility of overlapping makes their individual contributions hard to track. You
                                                    also have to watch for interaction effects when using more than one variable
                                                    to predict a response.
                                                    Simple and multiple linear regression assume that the response variable (the
                                                    one being studied) is quantitative in nature (that is, it measures or counts
                                                    something). However, you may be interested in making predictions about a
                                                    variable that has only two outcomes: yes or no. For example, whether or not
                                                    a certain horse will win a race; whether a baby will be a girl or a boy; or
                                                    whether or not a tropical storm is going to make landfall. These situations
                                                    require a different kind of regression called logistic regression. (I discuss logis-
                                                    tic regression in Chapter 8.)
   43   44   45   46   47   48   49   50   51   52   53