Page 299 - Statistics for Dummies
P. 299

Chapter 18: Looking for Links: Correlation and Regression
                                                    Calculating the correlation
                                                    In the earlier section “Interpreting a scatterplot,” I say data that resembles an
                                                    uphill line has a positive linear relationship and data that resembles a down-
                                                    hill line has a negative linear relationship. However, I didn’t address the issue
                                                    of whether or not the linear relationship was strong or weak. The strength of
                                                    a linear relationship depends on how closely the data resembles a line, and of
                                                    course varying levels of “closeness to a line” exist.
                                                    Can one statistic measure both the strength and direction of a linear relation-
                                                    ship between two variables? Sure! Statisticians use the correlation coefficient
                                                    to measure the strength and direction of the linear relationship between two
                                                    numerical variables X and Y. The correlation coefficient for a sample of data
                                                    is denoted by r.
                                                    Although the street definition of correlation applies to any two items that are
                                                    related (such as gender and political affiliation), statisticians use this term
                                                    only in the context of two numerical variables. The formal term for correlation   283
                                                    is the correlation coefficient. Many different correlation measures have been
                                                    created; the one used in this case is called the Pearson correlation coefficient
                                                    (but from now on I’ll just call it the correlation).
                                                    The formula for the correlation (r) is
                                                    where n is the number of pairs of data;   and   are the sample means of all
                                                    the x-values and all the y-values, respectively; and s  and s  are the sample
                                                                                                x     y
                                                    standard deviations of all the x- and y-values, respectively.
                                                   Use the following steps to calculate the correlation, r, from a data set:
                                                     1. Find the mean of all the x-values ( ) and the mean of all the y-values ( ).
                                                         See Chapter 5 for more on calculating the mean.
                                                      2. Find the standard deviation of all the x-values (call it s ) and the stan-
                                                                                                         x
                                                        dard deviation of all the y-values (call it s ).
                                                                                             y
                                                         See Chapter 5 to find out how to calculate the standard deviation.
                                                      3. For each (x, y) pair in the data set, take x minus   and y minus  , and
                                                        multiply them together to get          .
                                                      4. Add up all the results from Step 3.











                                                                                                                           3/25/11   8:13 PM
                             26_9780470911082-ch18.indd   283
                             26_9780470911082-ch18.indd   283                                                              3/25/11   8:13 PM
   294   295   296   297   298   299   300   301   302   303   304