Page 328 - Statistics for Environmental Engineers
P. 328
L1592_frame_C38 Page 337 Tuesday, December 18, 2001 3:21 PM
38
Empirical Model Building by Linear Regression
KEY WORDS all possible regressions, analysis of variance, coefficient of determination, confidence
interval, diagnostic checking, empirical models, F test, least squares, linear regression, overfitting, par-
simonious model, polynomial, regression sum of squares, residual plot, residual sum of squares, sedimen-
tation, solids removal, standard error, t statistic, total sum of squares.
Empirical models are widely used in engineering. Sometimes the model is a straight line; sometimes a
mathematical French curve — a smooth interpolating function — is needed. Regression provides the
means for selecting the complexity of the French curve that can be supported by the available data.
Regression begins with the specification of a model to be fitted. One goal is to find a parsimonious
model — an adequate model with the fewest possible terms. Sometimes the proposed model turns out
to be too simple and we need to augment it with additional terms. The much more common case,
however, is to start with more terms than are needed or justified. This is called overfitting. Overfitting
is harmful because the prediction error of the model is proportional to the number of parameters in
the model.
A fitted model is always checked for inadequacies. The statistical output of regression programs is
somewhat helpful in doing this, but a more satisfying and useful approach is to make diagnostic plots
of the residuals. As a minimum, the residuals should be plotted against the predicted values of the fitted
model. Plots of residuals against the independent variables are also useful. This chapter illustrates how
this diagnosis is used to decide whether terms should be added or dropped to improve a model. If a tentative
model is modified, it is refitted and rechecked. The model builder thus works iteratively toward the
simplest adequate model.
A Model of Sedimentation
Sedimentation removes solid particles from a liquid by allowing them to settle under quiescent conditions.
An ideal sedimentation process can be created in the laboratory in the form of a batch column. The column
is filled with the suspension (turbid river water, industrial wastewater, or sewage) and samples are taken over
time from sampling ports located at several depths along the column. The measure of sedimentation efficiency
will be solids concentrations (or fraction of solids removed), which will be measured as a function of
time and depth.
The data come from a quiescent batch settling test. At the beginning of the test, the concentration is
uniform over the depth of the test settling column. The mass of solids in the column initially is M =
3
C 0 ZA, where C 0 is the initial concentration (g/m ), Z is the water depth in the settling column (m), and
2
A is the cross-sectional area of the column (m ). This is shown in the left-hand panel of Figure 38.1.
After settling has progressed for time t, the concentration near the bottom of the column has increased
relative to the concentration at the top to give a solids concentration profile that is a function of depth
at any time t. The mass of solids remaining above depth z is M = A∫C(z, t)dz. The total mass of solids
in the column is still M = C 0 ZA. This is shown in the right-hand panel of Figure 38.1.
© 2002 By CRC Press LLC