Page 309 - Statistics for Dummies
P. 309

Chapter 18: Looking for Links: Correlation and Regression
                                                    For example, in the chirping data, no data is collected for fewer than 18
                                                    chirps or more than 39 chirps per 15 seconds (refer to Table 18-1). If you try
                                                    to make predictions outside this range, you are going into uncharted terri-
                                                    tory; the farther outside this range you go with your x-values, the more dubi-
                                                    ous your predictions for y will get. Who’s to say the line still works outside
                                                    of the area where data were collected? Do you really think that crickets will
                                                    chirp faster and faster without limit? At some point they would either pass
                                                    out or burn up! And what does a negative number of chirps really mean?
                                                    (Is this similar to asking what the sound of one hand clapping is?)
                                                   Be aware that not every data point will necessarily fit the regression line well,
                                                    even if the correlation is high. A point or two may fall outside the overall pat-
                                                    tern of the rest of the data; such points are called outliers. One or two outliers
                                                    probably won’t affect the overall fit of the regression line much, but in the end
                                                    you can see that the line didn’t do well at those specific points.
                                                    The numerical difference between the predicted value of y from the line and
                                                    the actual y-value you got from your data is called a residual. Outliers have   293
                                                    large residuals compared to the rest of the points; they are worth investigat-
                                                    ing to see if there was an error in the data at those points or if there is some-
                                                    thing particularly interesting in the data to follow up on. (I give a much more
                                                    detailed look at residuals in the book Statistics II For Dummies.)
                                         Explaining the Relationship: Correlation
                                         versus Cause and Effect
                                                    Scatterplots and correlations identify and quantify relationships between two
                                                    variables. However, if a scatterplot shows a definite pattern and the data are
                                                    found to have a strong correlation, that doesn’t necessarily mean that a cause-
                                                    and-effect relationship exists between the two variables. A cause-and-effect
                                                    relationship is one where a change in one variable (in this case X) causes a
                                                    change in another variable (in this case Y). (In other words, the change in Y is
                                                    not only associated with a change in X, but also directly caused by X.)
                                                    For example, suppose a well-controlled medical experiment is conducted to
                                                    determine the effects of dosage of a certain drug on blood pressure. (See a
                                                    total breakdown of experiments in Chapter 17.) The researchers look at their
                                                    scatterplot and see a definite downhill linear pattern; they calculate the cor-
                                                    relation, and it’s strong. They conclude that increasing the dosage of this drug
                                                    causes a decrease in blood pressure. This cause-and-effect conclusion is okay
                                                    because they controlled for other variables that could affect blood pressure in
                                                    their experiment, such as other drugs taken, age, general health, and so on.











                                                                                                                           3/25/11   8:13 PM
                             26_9780470911082-ch18.indd   293                                                              3/25/11   8:13 PM
                             26_9780470911082-ch18.indd   293
   304   305   306   307   308   309   310   311   312   313   314