Page 46 - Statistics II for Dummies
P. 46

30       Part I: Tackling Data Analysis and Model-Building Basics



                                According to Figure 2-1, it appears that as the number of putts increases, so
                                does the golfer’s total score. It also shows that the variables increase in a
                                linear way; that is, the data form a pattern that resembles a straight line. The
                                relationship seems pretty strong — the number of putts plays a big part in
                                determining the total score.
                                Now you need a measure of how strong the relationship is between x and y
                                and whether it goes uphill or downhill. Different measures are used for
                                different types of patterns seen in a scatterplot. Because the relationship we
                                see in this case resembles a straight line, the correlation is the measure that
                                we use to quantify the relationship. Correlation is the number that measures
                                how close the points follow a straight line. Correlation is always between –1.0
                                and +1.0, and the more closely the points follow a straight line, the closer the
                                correlation is to –1.0 or +1.0.

                                  ✓ A positive correlation means that as x increases on the x-axis, y also
                                    increases on the y-axis. Statisticians call this type of relationship an
                                    uphill relationship.

                                  ✓ A negative correlation means that as x increases on the x-axis, y goes
                                    down. Statisticians call this type of relationship — you guessed it — a
                                    downhill relationship.

                                For the golf data set, the correlation is 0.896 = 0.90, which is extremely high
                                as correlations go. The sign of the correlation is positive, so as you increase
                                number of putts, your total score increases (an uphill relationship). For
                                instructions on calculating a correlation in Minitab, see Chapter 4.


                                Predicting y using x


                                If you want to predict some response variable (y) using one explanatory
                                variable (x) and you want to use a straight line to do it, you can use simple
                                linear regression (see Chapter 4 for all the fine points on this topic). Linear
                                regression finds the best-fitting line — called the regression line — that cuts
                                through the data set. After you get the regression line, you can plug in a value
                                of x and get your prediction for y. (For instructions on using Minitab to find
                                the best-fitting line for your data, see Chapter 4.)

                                To use the golf example from the previous section, suppose you want to
                                predict the total score you can get for a certain number of putts. In this case,
                                you want to calculate the linear regression line. By running a regression
                                analysis on the data set, the computer tells you that the best line to use to
                                predict total score using number of putts is the following:













          06_466469-ch02.indd   30                                                                    7/24/09   9:31:39 AM
   41   42   43   44   45   46   47   48   49   50   51