Page 154 - Statistics II for Dummies
P. 154
138
Part II: Using Different Types of Regression to Make Predictions
Understanding a Logistic
Regression Model
In a logistic regression, you’re estimating the probability that an event occurs
for a randomly selected individual versus the probability that the event
doesn’t occur. In essence, you’re looking at yes or no data: yes it occurred
(probability = p); or no, it didn’t occur (probability = 1 – p). Yes or no data
that come from a random sample have a binomial distribution with probabil-
ity of success (the event occurring) equal to p.
In the binomial problems you saw in Stats I, you had a sample of size n trials,
you had yes or no data, and you had a probability of success on each trial,
denoted by p. In your Stats I course, for any binomial problem the value of p
was somehow given to be a certain value, like a fair coin has probability p =
0.50 for coming up heads. But in Stats II, you operate under the much more
realistic scenario that it’s not. In fact, because p isn’t known, your job is to
estimate what it is and use a model to do that.
To estimate p, the chance of an event occurring, you need data that come in
the form of yes or no, indicating whether or not the event occurred for each
individual in the data set.
Because yes or no data don’t have a normal distribution, which is a condi-
tion needed for other types of regression, you need a new type of regression
model to do this job; that model is logistic regression.
How is logistic regression different
from other regressions?
You use logistic regression when you use a quantitative variable to predict or
guess the outcome of some categorical variable with only two outcomes (for
example, using barometric pressure to predict whether or not it will rain).
A logistic regression model ultimately gives you an estimate for p, the probability
that a particular outcome will occur in a yes or no situation (for example, the
chance that it will rain versus not). The estimate is based on information from one
or more explanatory variables; you can call them x , x , x , . . . x . (For example,
1 2 3 k
x = humidity, x = barometric pressure, x = cloud cover, . . . and x = wind speed.)
1 2 3 k
Because you’re trying to use one variable (x) to make a prediction for another
variable (y), you may think about using regression — and you would be right.
However, you have many types of regression to choose from, and you need
7/23/09 9:28:35 PM
13_466469-ch08.indd 138 7/23/09 9:28:35 PM
13_466469-ch08.indd 138