Page 309 - Statistics for Dummies
P. 309
Chapter 18: Looking for Links: Correlation and Regression
For example, in the chirping data, no data is collected for fewer than 18
chirps or more than 39 chirps per 15 seconds (refer to Table 18-1). If you try
to make predictions outside this range, you are going into uncharted terri-
tory; the farther outside this range you go with your x-values, the more dubi-
ous your predictions for y will get. Who’s to say the line still works outside
of the area where data were collected? Do you really think that crickets will
chirp faster and faster without limit? At some point they would either pass
out or burn up! And what does a negative number of chirps really mean?
(Is this similar to asking what the sound of one hand clapping is?)
Be aware that not every data point will necessarily fit the regression line well,
even if the correlation is high. A point or two may fall outside the overall pat-
tern of the rest of the data; such points are called outliers. One or two outliers
probably won’t affect the overall fit of the regression line much, but in the end
you can see that the line didn’t do well at those specific points.
The numerical difference between the predicted value of y from the line and
the actual y-value you got from your data is called a residual. Outliers have 293
large residuals compared to the rest of the points; they are worth investigat-
ing to see if there was an error in the data at those points or if there is some-
thing particularly interesting in the data to follow up on. (I give a much more
detailed look at residuals in the book Statistics II For Dummies.)
Explaining the Relationship: Correlation
versus Cause and Effect
Scatterplots and correlations identify and quantify relationships between two
variables. However, if a scatterplot shows a definite pattern and the data are
found to have a strong correlation, that doesn’t necessarily mean that a cause-
and-effect relationship exists between the two variables. A cause-and-effect
relationship is one where a change in one variable (in this case X) causes a
change in another variable (in this case Y). (In other words, the change in Y is
not only associated with a change in X, but also directly caused by X.)
For example, suppose a well-controlled medical experiment is conducted to
determine the effects of dosage of a certain drug on blood pressure. (See a
total breakdown of experiments in Chapter 17.) The researchers look at their
scatterplot and see a definite downhill linear pattern; they calculate the cor-
relation, and it’s strong. They conclude that increasing the dosage of this drug
causes a decrease in blood pressure. This cause-and-effect conclusion is okay
because they controlled for other variables that could affect blood pressure in
their experiment, such as other drugs taken, age, general health, and so on.
3/25/11 8:13 PM
26_9780470911082-ch18.indd 293 3/25/11 8:13 PM
26_9780470911082-ch18.indd 293