Page 299 - Statistics for Dummies
P. 299
Chapter 18: Looking for Links: Correlation and Regression
Calculating the correlation
In the earlier section “Interpreting a scatterplot,” I say data that resembles an
uphill line has a positive linear relationship and data that resembles a down-
hill line has a negative linear relationship. However, I didn’t address the issue
of whether or not the linear relationship was strong or weak. The strength of
a linear relationship depends on how closely the data resembles a line, and of
course varying levels of “closeness to a line” exist.
Can one statistic measure both the strength and direction of a linear relation-
ship between two variables? Sure! Statisticians use the correlation coefficient
to measure the strength and direction of the linear relationship between two
numerical variables X and Y. The correlation coefficient for a sample of data
is denoted by r.
Although the street definition of correlation applies to any two items that are
related (such as gender and political affiliation), statisticians use this term
only in the context of two numerical variables. The formal term for correlation 283
is the correlation coefficient. Many different correlation measures have been
created; the one used in this case is called the Pearson correlation coefficient
(but from now on I’ll just call it the correlation).
The formula for the correlation (r) is
where n is the number of pairs of data; and are the sample means of all
the x-values and all the y-values, respectively; and s and s are the sample
x y
standard deviations of all the x- and y-values, respectively.
Use the following steps to calculate the correlation, r, from a data set:
1. Find the mean of all the x-values ( ) and the mean of all the y-values ( ).
See Chapter 5 for more on calculating the mean.
2. Find the standard deviation of all the x-values (call it s ) and the stan-
x
dard deviation of all the y-values (call it s ).
y
See Chapter 5 to find out how to calculate the standard deviation.
3. For each (x, y) pair in the data set, take x minus and y minus , and
multiply them together to get .
4. Add up all the results from Step 3.
3/25/11 8:13 PM
26_9780470911082-ch18.indd 283
26_9780470911082-ch18.indd 283 3/25/11 8:13 PM