Page 278 - Statistics for Environmental Engineers
P. 278

L1592_Frame_C31  Page 282  Tuesday, December 18, 2001  2:50 PM









                       This is the Pearson product-moment correlation coefficient, usually just called the correlation coefficient.
                       The range of r is from −1 to +1.




                       Case Study: Correlation of BOD and COD Measurements
                       Figure 31.1 shows n = 90 pairs of effluent BOD 5  and COD concentrations (mg/L) from Table 31.1, and
                       the same data after a log transformation. We know that these two measures of wastewater strength are
                       related. The purpose of calculating a correlation coefficient is to quantify the strength of the relationship.
                        We find r = 0.59, indicating a moderate positive correlation, which is consistent with the impression
                       gained from the graphical display. It makes no difference whether COD or BOD is plotted on the x-axis;
                       the sample correlation coefficient is still r = 0.59. The log-transformed data transformation have symmetry
                       about the median, but they also appear variable, and perhaps curvilinear, and the correlation coefficient
                       is reduced (r = 0.53).
                        It is tempting to use ordinary regression to fit a straight line for predicting BOD from COD, as shown
                                                                          2
                       in Figure 31.2. The model would be BOD = 2.5 + 1.6 COD, with R  = 0.35. Fitting COD = 2.74 + 0.22
                                    2                 2
                       BOD also gives R  = 0.35. Notice that R  is the same in both cases and that it happens to be the squares
                                                                           2
                                                                    2
                       of the correlation coefficient between the two variables (r  = 0.59  = 0.35). In effect, regression has
                                                                                      2
                       revealed the same information about the strength of the association although R  and  r are different
                                                                                  2
                       statistics with different interpretations. This correspondence between r and R  is true only for straight-
                       line relations.
                                           20                    30
                                               r  = 0.59
                                           15
                                          BOD  10               COD 20
                                            5                    10
                                                                             r  = 0.59
                                            0                     0
                                             0    10   20   30     0   5  10  15  20
                                                    COD                   BOD
                                           1.2
                                               r  = 0.53         1.4
                                          In (BOD) 1.0          In (COD)  1.0
                                           0.8
                                                                 0.6
                                           0.6
                                                                             r  = 0.53
                                           0.4                   0.2
                                             0.2  0.6  1.0  1.4    0.4  0.6  0.8  1.0  1.2
                                                  In (COD)              In (BOD)
                       FIGURE 31.1 Scatterplot for 90 pairs of effluent five-day BOD vs. COD measurements, and ln(BOD) vs. ln(COD).

                                             BOD = 2.5 + 1.6 COD   COD = 2.74 + 0.22 BOD
                                                                     2
                                                2
                                              R  = 0.35             R  = 0.35
                                           20                    30
                                          BOD  10               COD 20
                                                                 10

                                           0                      0
                                             0   10    20   30     0      10      20
                                                   COD                   BOD

                       FIGURE 31.2 Two possible regressions on the COD and BOD 5  data. Both are invalid because the x and y variables have
                       substantial measurement error.
                       © 2002 By CRC Press LLC
   273   274   275   276   277   278   279   280   281   282   283