Page 155 - Statistics II for Dummies
P. 155

Chapter 8: Making Predictions by Using Logistic Regression
                                  to determine what kind is most appropriate here. You need the type of   139
                                  regression that uses a quantitative variable (x) to predict the outcome of
                                  some categorical variable (y) that has only two outcomes (yes or no).

                                  So being the good Stats II student that you are, you go to your trusty list of
                                  statistical techniques, and you look under regression — and immediately see
                                  more than one type.
                                   ✓ You see simple linear regression. No, you use that when you have one
                                      quantitative variable predicting another (see Chapter 4).
                                   ✓ Multiple regression? No, that method just expands simple linear regres-
                                      sion to add more x variables (see Chapter 5).
                                   ✓ Nonlinear regression? Well no, that still works with two quantitative
                                      variables; it’s just that the data form a curve, not a line.
                                  But then you come across logistic regression, and . . . eureka! You see that
                                  logistic regression handles situations where the x variable is numerical and
                                  the y variable is categorical with two possible categories. Just what you’re
                                  looking for!
                                  Logistic regression, in essence, estimates the probability of y being in one cat-
                                  egory or the other, based on the value of some quantitative variable, x. For
                                  example, suppose you want to predict someone’s height based on gender.
                                  Because gender is a categorical variable, you use logistic regression to make
                                  these predictions. Suppose a 1 indicates a male. People who receive a proba-
                                  bility of more than 0.5 of being male (based on their heights) are predicted to
                                  be male, and people who receive a probability of less than 0.5 of being male
                                  (based on their heights) are predicted to be female.

                                  In this chapter, I present only the case where you use one explanatory vari-
                                  able to predict the outcome. You can extend the ideas in exactly the same way
                                  as you can extend the simple linear regression model to a multiple regression
                                  model.


                                  Using an S-curve to estimate probabilities


                                  In a simple linear regression model, the general form of a straight line is
                                  y = β  + β x and y is a quantitative variable. In the logistic regression model,
                                      0  1
                                  the y variable is categorical, not quantitative. What you’re estimating, how-
                                  ever, is not which category the individual lies in, but rather what the proba-
                                  bility is that the individual lies in a certain category. So, the model for logistic
                                  regression is based on estimating this probability, called p.












                                                                                                       7/23/09   9:28:35 PM
           13_466469-ch08.indd   139                                                                   7/23/09   9:28:35 PM
           13_466469-ch08.indd   139
   150   151   152   153   154   155   156   157   158   159   160