Page 94 - Becoming Metric Wise
P. 94

84    Becoming Metric-Wise


          dependent variable. When the independent variable is time, the scatterplot
          represents evolution of a variable over time.
             It is possible to determine a best fitting line for a scatterplot. If the
          independent variable is denoted as x, and the dependent one as y, then
          this best fitting line has the equation
                                      y 5 a 1 bx                      (4.12)

          a is called the intercept and b is called the slope. When the slope is posi-
          tive the line is increasing and when it is negative the line is decreasing.
          When b is zero, y is constant. The best fitting line through the scatterplot
          ðx i ; y i Þ is obtained as follows:
                i

                                        n
                                       1  P  x i y i 2 x:y
                                       n
                                        i51
                                         n
                                 b 5                                  (4.13)
                                       1  P  x 2  2 ðxÞ 2
                                       n    i
                                        i51
          and
                                      a 5 y 2 b:x                     (4.14)
             A best fitting line, often called a regression line, can always be calcu-
          lated, even if the scatterplot has no linear appearance at all. For this rea-
          son, a measure of the quality of the fit of the regression line to the
          scatterplot is calculated. This measure is called the Pearson correlation
          coefficient.


          4.7.2 Pearson Correlation

          The Pearson correlation coefficient is given as:
                                           n

                                         1  P  x i y i 2 x:y
                                         n
                       b:s x              i51
               rðx; yÞ 5                                              (4.15)
                                ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi s
                           5 s


                                                  ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
                        s y        n                  n
                                             2
                                 1  P  x 2  2 xðÞ    1  P  y 2  2 yðÞ 2
                                 n    i             n    i
                                  i51                i51
             As standard deviations s x and s y are always positive, this equation
          shows that the correlation coefficient and the slope of the regression line
          have the same sign. Of course, neither the calculation of a regression line,
          nor that of the correlation coefficient is usually done by hand. One uses a
          software package or a pocket calculator.
   89   90   91   92   93   94   95   96   97   98   99