Page 339 - Statistics for Environmental Engineers
P. 339

L1592_frame_C39  Page 348  Tuesday, December 18, 2001  3:22 PM









                                              TABLE 39.2
                                                      2
                                              Values of R  Required to Establish Statistical
                                              Significance of a Simple Linear Regression
                                              Equation for Various Sample Sizes
                                              Sample Size  Statistical Significance Level
                                                   n      10%      5%       1%
                                                  3       0.98     0.99     0.99
                                                  4       0.81     0.90     0.98
                                                  5       0.65     0.77     0.92
                                                  6       0.53     0.66     0.84
                                                  8       0.39     0.50     0.70
                                                 10       0.30     0.40     0.59
                                                 12       0.25     0.33     0.50
                                                 15       0.19     0.26     0.41
                                                 20       0.14     0.20     0.31
                                                 25       0.11     0.16     0.26
                                                 30       0.09     0.13     0.22
                                                 40       0.07     0.10     0.16
                                                 50       0.05     0.08     0.13
                                                 100      0.03     0.04     0.07
                                              Source: Hahn, G. J. (1973). Chemtech, October,
                                              pp. 609– 611.



                                                                                                  2
                       may explain a large proportion of the variability in the dependent variable, and thus have a high R , yet
                       unexplained variability may be too large for useful prediction. It is not possible to tell from the magnitude
                          2
                       of R  how accurate the predictions will be.





                                           2
                       The Magnitude of R  Depends on the Range of Variation in X
                                  2
                       The value of R  decreases with a decrease in the range of variation of the independent variable, other
                       things being equal, and assuming the correct model is being fitted to the data. Figure 39.3 (upper
                                                                    2
                       left-hand panel) shows a set of 50 data points that has R = 0.77. Suppose, however, that the range
                       of x that could be investigated is only from 14 to 16 (for example, because a process is carefully
                       constrained within narrow operating limits) and the available data are those shown in the upper right-
                       hand panel of Figure 39.3. The underlying relationship is the same, and the measurement error in
                                                   2                                     2
                       each observation is the same, but R  is now only 0.12. This dramatic reduction in R  occurs mainly
                       because the range of x is restricted and not because the number of observations is reduced. This is
                       shown by the two lower panels. Fifteen points (the same number as found in the range of x = 14 to
                                                        2
                       16), located at x = 10, 15, and 20, give R = 0.88. Just 10 points, at x = 10 and 20, gives an even
                                  2
                       larger value, R = 0.93.
                                                            2
                        These examples show that a large value of R  might reflect the fact that data were collected over
                       an unrealistically large range of the independent variable x. This can happen, especially when x is
                       time. Conversely, a small value might be due to a limited range of x, such as when x is carefully
                       controlled by a process operator. In this case, x is constrained to a narrow range because it is known
                       to be highly important, yet this importance will not be revealed by doing regression on typical data
                       from the process.
                                                                2
                        Linear calibration curves always have a very high R , usually 0.99 and above. One reason is that the
                       x variable covers a wide range (see Chapter 36.)
                       © 2002 By CRC Press LLC
   334   335   336   337   338   339   340   341   342   343   344