Page 48 - Intermediate Statistics for Dummies
P. 48
05_045206 ch01.qxd 2/1/07 9:41 AM Page 27
Chapter 1: Beyond Number Crunching: The Art and Science of Data Analysis
Linear regression
After you’ve determined that two variables have a fairly strong linear rela-
tionship, you may want to try to make predictions for one variable based on
the value of the other variable. For example, if you know that a fairly strong
negative linear relationship exists between coffees sold and the air tempera-
ture at a football game, you may want to use this information to predict how
much coffee is needed for a game, just by knowing the temperature. This
method of finding the best-fitting line is called linear regression.
In the coffees and temperature example (see Figure 1-5), the best-fitting line
has the equation y = 49,337 – 554 x , where x is temperature and y is the
*
number of coffees sold. So when the temperature (x) is zero degrees, you can
expect to sell around 49,337 coffees (this is how you interpret the y-intercept
of the line). To interpret the slope of this line, think of –554 as –554 divided
by one and use the old rise-over-run idea using coffees and degrees of tem-
perature. Applied here, it means that for every one degree increase in tem-
perature, you can expect the coffee sales to decrease by 554. You can use this 27
line to make predictions for reasonable values of the temperature (x). For
example, if the temperature is a cold 20-degrees Fahrenheit, you can predict
that the number of coffees sold will be around 49,337 – 554 20 = 38,257.
*
When you use only one variable to predict the response, the method of
regression is called simple linear regression. (I review the basics of simple
linear regression in Chapter 4. But many other types of regression are out
there, many of which I discuss in this book.)
Most researchers use more than one variable to predict a response; this tech-
nique is called multiple linear regression. (Check out Chapter 5 for the details
about multiple linear regression.) Multiple linear regression has many issues
of its own because some variables you can use in the model may be related
to each other, making overlapping contributions to the response. That possi-
bility of overlapping makes their individual contributions hard to track. You
also have to watch for interaction effects when using more than one variable
to predict a response.
Simple and multiple linear regression assume that the response variable (the
one being studied) is quantitative in nature (that is, it measures or counts
something). However, you may be interested in making predictions about a
variable that has only two outcomes: yes or no. For example, whether or not
a certain horse will win a race; whether a baby will be a girl or a boy; or
whether or not a tropical storm is going to make landfall. These situations
require a different kind of regression called logistic regression. (I discuss logis-
tic regression in Chapter 8.)