Page 343 - Statistics for Environmental Engineers
P. 343

L1592_frame_C39  Page 352  Tuesday, December 18, 2001  3:22 PM









                                                               2
                        Totally  spurious correlations, often with high  R  values, can arise when unrelated variables are
                       combined. Two examples of particular interest to environmental engineers are presented by Sherwood
                       (1974) and Rowe (1974). Both emphasize graphical analysis to stimulate and support any regression
                       analysis. Rowe discusses the particular dangers that arise when sets of variables are combined to create
                       new variables such as dimensional numbers (Froude number, etc.). Benson (1965) points out the same
                       kinds of dangers in the context of hydraulics and hydrology.



                       References
                                                                 2
                       Anderson-Sprecher, R. (1994). “Model Comparison and R ,” Am. Stat., 48(2), 113–116.
                       Anscombe, F. J. (1973). “Graphs in Statistical Analysis,” Am. Stat., 27, 17–21.
                       Benson, M. A. (1965). “Spurious Correlation in Hydraulics and Hydrology,” J. Hydraulics Div., ASCE, 91,
                           HY4, 35–45.
                       Box, G. E. P. (1966). “The Use and Abuse of Regression,” Technometrics, 8, 625–629.
                       Box, G. E. P. and J. Wetz (1973). “Criteria for Judging Accuracy of Estimation by an Approximating Response
                           Function,” Madison, WI, University of Wisconsin Statistics Department, Tech. Rep. No. 9.
                       Draper, N. R. and H. Smith (1998). Applied Regression Analysis, 3rd ed., New York, John Wiley.
                       Hahn, G. J. (1973). “The Coefficient of Determination Exposed,” Chemtech, October, pp. 609–611.
                       Rowe, P. N. (1974). “Correlating Data,” Chemtech, January, pp. 9–14.
                       Sherwood, T. K. (1974). “The Treatment and Mistreatment of Data,” Chemtech, December, pp. 736–738.
                       Tufte, E. R. (1983). The Visual Display of Quantitative Information, Cheshire, CT, Graphics Press.



                       Exercises
                        39.1 COD Calibration. The ten pairs of readings below were obtained to calibrate a UV spectro-
                             photometer to measure chemical oxygen demand (COD) in wastewater.


                              COD (mg/L)     60   90   100   130   195   250  300   375   500   600
                              UV Absorbance  0.30  0.35  0.45  0.48  0.95  1.30  1.60  1.80  2.3  2.55

                                                                    2
                                                                                                 2
                             (a) Fit a linear model to the data and obtain the R  value. (b) Discuss the meaning of R  in
                             the context of this calibration problem. (c) Exercise 36.3 contains a larger calibration data
                             set for the same instrument. (d) Fit the model to the larger sample and compare the values
                                2
                                                                     2
                             of R . Will the calibration curve with the highest R  best predict the COD concentration?
                             Explain why or why not.
                        39.2 Stream pH. The data below are n = 200 monthly pH readings on a stream that cover a period of
                             almost 20 years. The data read from left to right. The fitted regression model is   = 7.1435 −
                                                                                         y ˆ
                                       2
                             0.0003776t; R  = 0.042. The confidence interval of the slope is [−0.00063, −0.000013]. Why
                                2
                             is R  so low? Is the regression statistically significant? Is stream pH decreasing? What is the
                             practical value of the model?
                              7.0 7.2 7.2 7.3 7.2 7.2 7.2 7.2 7.0 7.1 7.3 7.1 7.1 7.1 7.2 7.3 7.2 7.3 7.2  7.2
                              7.1 7.4 7.1 6.8 7.3 7.3 7.0 7.0 6.9 7.2 7.2 7.3 7.0 7.0 7.1 7.1 7.0 7.2 7.2  7.2
                              7.2 7.1 7.2 7.0 7.0 7.2 7.1 7.1 7.2 7.2 7.2 7.0 7.1 7.1 7.2 7.1 7.2 7.0 7.1  7.2
                              7.1 7.0 7.1 7.4 7.2 7.2 7.2 7.2 7.1 7.0 7.2 7.0 6.9 7.2 7.0 7.0 7.1 7.0 6.9  6.9
                              7.0 7.0 7.2 6.9 7.4 7.0 6.9 7.0 7.1 7.0 7.2 7.2 7.0 7.0 7.1 7.1 7.0 7.2 7.2  7.0
                              7.0 7.2 7.1 7.1 7.1 7.0 7.0 7.0 7.1 7.3 7.1 7.2 7.2 7.2 7.1 7.2 7.2 7.1 7.1  7.1
                              7.2 6.8 7.2 7.2 7.0 7.1 7.1 7.2 7.0 7.1 7.1 7.1 7.0 7.2 7.1 7.1 7.3 6.9 7.2  7.2
                              7.1 7.1 7.0 7.0 7.1 7.1 7.0 7.0 7.0 7.1 7.0 7.1 7.1 7.2 7.2 7.1 7.0 7.0 7.2  7.2
                              7.0 7.1 7.2 7.1 7.1 7.0 7.1 7.0 7.2 7.1 7.1 7.1 7.2 7.1 7.0 7.1 7.2 7.2 7.1  7.2
                              7.0 7.1 7.0 7.1 7.0 6.9 6.9 7.2 7.1 7.2 7.1 7.1 7.0 7.0 6.9 7.1 6.8 7.1 7.0  7.0

                       © 2002 By CRC Press LLC
   338   339   340   341   342   343   344   345   346   347   348