Page 95 - Becoming Metric Wise
P. 95

85
                                                                   Statistics

                 Using the correlation coefficient, a best fitting line can be rewritten,
              more symmetrically, as:
                                     y 2 y        x 2 x
                                          5 rðx; yÞ                      (4.16)
                                      s y           s x
                 It can be shown that 21 # r(x,y) #1 1. If r is about zero there is no
              linear relation between the variables x and y.If r is close to 21or 11
              there is a strong linear relation. For values between 0 and 1 the linear
              relation is more or less strong and positive; for values between 21 and 0
              the linear relation is similarly more or less strong and negative. Note that
              it is possible that a weak linear relation corresponds to a strong nonlinear
              relation, e.g., an exponential one.

              4.7.3 Spearman Correlation

              The Pearson correlation coefficient measures a linear relation and can be
              highly sensitive to outliers. In such cases one prefers the Spearman corre-
              lation, which is a robust measure of association. It is determined by rank-
              ing each of the two groups (from largest to smallest or vice versa, this
              does not matter). In case of ties, an average rank is used. The Spearman
              correlation coefficient is then calculated in exactly the same way as the
              Pearson correlation, but using ranks instead of the real observations. Also,
              the interpretation of the Spearman correlation differs from Pearson’s.
              Pearson correlation coefficient is a measure of linearity, while Spearman’s
              is a measure of monotonicity i.e., it determines whether or not the order
              between the variables is preserved. Of course, a perfect linear relation is
              monotone, but the opposite does not hold.
                 It can be shown that the Spearman rank correlation coefficient R S can
              be calculated as:
                                                  n
                                                 P   2
                                               6    d
                                                     i
                                     R S 5 1 2   i51                     (4.17)
                                                 2
                                              nðn 2 1Þ
              where d i denotes the difference in ranking for the ith item and n is the
              number of items studied.

              4.8 NONPARAMETRIC LINEAR REGRESSION

              Nonparametric linear regression is a distribution-free method for investigat-
              ing a linear relationship between two variables Y (dependent, outcome) and
   90   91   92   93   94   95   96   97   98   99   100