Page 46 - Statistics II for Dummies
P. 46
30 Part I: Tackling Data Analysis and Model-Building Basics
According to Figure 2-1, it appears that as the number of putts increases, so
does the golfer’s total score. It also shows that the variables increase in a
linear way; that is, the data form a pattern that resembles a straight line. The
relationship seems pretty strong — the number of putts plays a big part in
determining the total score.
Now you need a measure of how strong the relationship is between x and y
and whether it goes uphill or downhill. Different measures are used for
different types of patterns seen in a scatterplot. Because the relationship we
see in this case resembles a straight line, the correlation is the measure that
we use to quantify the relationship. Correlation is the number that measures
how close the points follow a straight line. Correlation is always between –1.0
and +1.0, and the more closely the points follow a straight line, the closer the
correlation is to –1.0 or +1.0.
✓ A positive correlation means that as x increases on the x-axis, y also
increases on the y-axis. Statisticians call this type of relationship an
uphill relationship.
✓ A negative correlation means that as x increases on the x-axis, y goes
down. Statisticians call this type of relationship — you guessed it — a
downhill relationship.
For the golf data set, the correlation is 0.896 = 0.90, which is extremely high
as correlations go. The sign of the correlation is positive, so as you increase
number of putts, your total score increases (an uphill relationship). For
instructions on calculating a correlation in Minitab, see Chapter 4.
Predicting y using x
If you want to predict some response variable (y) using one explanatory
variable (x) and you want to use a straight line to do it, you can use simple
linear regression (see Chapter 4 for all the fine points on this topic). Linear
regression finds the best-fitting line — called the regression line — that cuts
through the data set. After you get the regression line, you can plug in a value
of x and get your prediction for y. (For instructions on using Minitab to find
the best-fitting line for your data, see Chapter 4.)
To use the golf example from the previous section, suppose you want to
predict the total score you can get for a certain number of putts. In this case,
you want to calculate the linear regression line. By running a regression
analysis on the data set, the computer tells you that the best line to use to
predict total score using number of putts is the following:
06_466469-ch02.indd 30 7/24/09 9:31:39 AM