Page 155 - Statistics II for Dummies
P. 155
Chapter 8: Making Predictions by Using Logistic Regression
to determine what kind is most appropriate here. You need the type of 139
regression that uses a quantitative variable (x) to predict the outcome of
some categorical variable (y) that has only two outcomes (yes or no).
So being the good Stats II student that you are, you go to your trusty list of
statistical techniques, and you look under regression — and immediately see
more than one type.
✓ You see simple linear regression. No, you use that when you have one
quantitative variable predicting another (see Chapter 4).
✓ Multiple regression? No, that method just expands simple linear regres-
sion to add more x variables (see Chapter 5).
✓ Nonlinear regression? Well no, that still works with two quantitative
variables; it’s just that the data form a curve, not a line.
But then you come across logistic regression, and . . . eureka! You see that
logistic regression handles situations where the x variable is numerical and
the y variable is categorical with two possible categories. Just what you’re
looking for!
Logistic regression, in essence, estimates the probability of y being in one cat-
egory or the other, based on the value of some quantitative variable, x. For
example, suppose you want to predict someone’s height based on gender.
Because gender is a categorical variable, you use logistic regression to make
these predictions. Suppose a 1 indicates a male. People who receive a proba-
bility of more than 0.5 of being male (based on their heights) are predicted to
be male, and people who receive a probability of less than 0.5 of being male
(based on their heights) are predicted to be female.
In this chapter, I present only the case where you use one explanatory vari-
able to predict the outcome. You can extend the ideas in exactly the same way
as you can extend the simple linear regression model to a multiple regression
model.
Using an S-curve to estimate probabilities
In a simple linear regression model, the general form of a straight line is
y = β + β x and y is a quantitative variable. In the logistic regression model,
0 1
the y variable is categorical, not quantitative. What you’re estimating, how-
ever, is not which category the individual lies in, but rather what the proba-
bility is that the individual lies in a certain category. So, the model for logistic
regression is based on estimating this probability, called p.
7/23/09 9:28:35 PM
13_466469-ch08.indd 139 7/23/09 9:28:35 PM
13_466469-ch08.indd 139