Page 61 - Intermediate Statistics for Dummies
P. 61

06_045206 ch02.qxd  2/1/07  9:42 AM  Page 40
                                40
                                         Part I: Data Analysis and Model-Building Basics
                                                    the closer the correlation is to –1.0 or +1.0. A positive correlation means that
                                                    as x increases on the x-axis, y also increases on the y-axis. Statisticians call
                                                    this type of relationship an uphill relationship. A negative correlation means
                                                    that as x increases on the x-axis, y goes down. Statisticians call this type of
                                                    relationship — you guessed it — a downhill relationship.
                                                    For the golf data set, the correlation is 0.896 = 0.90, which is extremely high
                                                    as correlations go. This strong correlation (close to +1.0) is a good thing
                                                    because it means number of putts can do a great job of predicting total score.
                                                    Because the sign of the correlation is positive, it means as you increase
                                                    number of putts, your total score increases (an uphill relationship). For
                                                    instructions on calculating a correlation in Minitab, see Chapter 4.
                                                    Making predictions
                                                    If you want to predict some response variable (y) using one explanatory vari-
                                                    able (x), and you want to use a straight line to do it, you can use simple linear
                                                    regression (see Chapter 4 for all the fine points on this topic). Linear regres-
                                                    sion finds the best-fitting line that cuts through the data set, called the regres-
                                                    sion line. After you get the regression line, you can plug in a value of x and
                                                    get your prediction for y. (For instructions on using Minitab to find the best-
                                                    fitting line for your data, see Chapter 4.)
                                                    To use the golf example from the previous section, suppose you want to pre-
                                                    dict the total score you can get for a certain number of putts. In this case, you
                                                    want to calculate the linear regression line. By using the data set shown in
                                                    Table 2-2, and running a regression analysis, the computer tells you that the
                                                    best line to use to predict total score using number of putts is the following:
                                                        Total score = 39.6 + 1.52  Number of putts
                                                                            *
                                                    So if you have 35 putts in an 18-hole golf course, your total score is predicted
                                                    to be about 39.6 + 1.52  *  35 = 92.8, or 93. (Not bad for 18 holes!)
                                                    Notice that the slope of the regression line tells you what you really want to
                                                    know — how much does your total score increase with every additional putt?
                                                    In other words, how much damage is done when you miss the hole on your
                                                    first, or second, or third putt? The slope of the regression line for the golf
                                                    data set is 1.52. Because the slope of a line is the ratio of the change in y
                                                    (total score) to the change in x (number of putts) this means that every addi-
                                                    tional putt you need results in an overall increase in total score by 1.52.
                                                    Maybe that’s why Tiger Woods spends so much time on his short game.
   56   57   58   59   60   61   62   63   64   65   66