Page 328 - Statistics for Environmental Engineers
P. 328

L1592_frame_C38  Page 337  Tuesday, December 18, 2001  3:21 PM









                       38




                       Empirical Model Building by Linear Regression






                       KEY WORDS all possible regressions, analysis of variance, coefficient of determination, confidence
                       interval, diagnostic checking, empirical models, F test, least squares, linear regression, overfitting, par-
                       simonious model, polynomial, regression sum of squares, residual plot, residual sum of squares, sedimen-
                       tation, solids removal, standard error, t statistic, total sum of squares.

                       Empirical models are widely used in engineering. Sometimes the model is a straight line; sometimes a
                       mathematical French curve — a smooth interpolating function — is needed. Regression provides the
                       means for selecting the complexity of the French curve that can be supported by the available data.
                        Regression begins with the specification of a model to be fitted. One goal is to find a parsimonious
                       model — an adequate model with the fewest possible terms. Sometimes the proposed model turns out
                       to be too simple and we need to augment it with additional terms. The much more common case,
                       however, is to start with more terms than are needed or justified. This is called overfitting. Overfitting
                       is harmful because the prediction error of the model is proportional to the number of parameters in
                       the model.
                        A fitted model is always checked for inadequacies. The statistical output of regression programs is
                       somewhat helpful in doing this, but a more satisfying and useful approach is to make diagnostic plots
                       of the residuals. As a minimum, the residuals should be plotted against the predicted values of the fitted
                       model. Plots of residuals against the independent variables are also useful. This chapter illustrates how
                       this diagnosis is used to decide whether terms should be added or dropped to improve a model. If a tentative
                       model is modified, it is refitted and rechecked. The model  builder thus  works iteratively toward the
                       simplest adequate model.





                       A Model of Sedimentation
                       Sedimentation removes solid particles from a liquid by allowing them to settle under quiescent conditions.
                       An ideal sedimentation process can be created in the laboratory in the form of a batch column. The column
                       is filled with the suspension (turbid river water, industrial wastewater, or sewage) and samples are taken over
                       time from sampling ports located at several depths along the column. The measure of sedimentation efficiency
                       will be solids concentrations (or fraction of solids removed), which will be measured as a function of
                       time and depth.
                        The data come from a quiescent batch settling test. At the beginning of the test, the concentration is
                       uniform over the depth of the test settling column. The mass of solids in the column initially is M =
                                                            3
                       C 0 ZA, where C 0  is the initial concentration (g/m ), Z is the water depth in the settling column (m), and
                                                          2
                       A is the cross-sectional area of the column (m ). This is shown in the left-hand panel of Figure 38.1.
                        After settling has progressed for time t, the concentration near the bottom of the column has increased
                       relative to the concentration at the top to give a solids concentration profile that is a function of depth
                       at any time t. The mass of solids remaining above depth z is M = A∫C(z, t)dz. The total mass of solids
                       in the column is still M = C 0 ZA. This is shown in the right-hand panel of Figure 38.1.




                       © 2002 By CRC Press LLC
   323   324   325   326   327   328   329   330   331   332   333