Page 328 - Statistics for Environmental Engineers

P. 328

L1592_frame_C38 Page 337 Tuesday, December 18, 2001 3:21 PM

Empirical Model Building by Linear Regression

KEY WORDS all possible regressions, analysis of variance, coefﬁcient of determination, conﬁdence
interval, diagnostic checking, empirical models, F test, least squares, linear regression, overﬁtting, par-
simonious model, polynomial, regression sum of squares, residual plot, residual sum of squares, sedimen-
tation, solids removal, standard error, t statistic, total sum of squares.

Empirical models are widely used in engineering. Sometimes the model is a straight line; sometimes a
mathematical French curve — a smooth interpolating function — is needed. Regression provides the
means for selecting the complexity of the French curve that can be supported by the available data.
Regression begins with the speciﬁcation of a model to be ﬁtted. One goal is to ﬁnd a parsimonious
model — an adequate model with the fewest possible terms. Sometimes the proposed model turns out
to be too simple and we need to augment it with additional terms. The much more common case,
however, is to start with more terms than are needed or justiﬁed. This is called overﬁtting. Overﬁtting
is harmful because the prediction error of the model is proportional to the number of parameters in
the model.
A ﬁtted model is always checked for inadequacies. The statistical output of regression programs is
somewhat helpful in doing this, but a more satisfying and useful approach is to make diagnostic plots
of the residuals. As a minimum, the residuals should be plotted against the predicted values of the ﬁtted
model. Plots of residuals against the independent variables are also useful. This chapter illustrates how
this diagnosis is used to decide whether terms should be added or dropped to improve a model. If a tentative
model is modiﬁed, it is reﬁtted and rechecked. The model builder thus works iteratively toward the
simplest adequate model.

A Model of Sedimentation
Sedimentation removes solid particles from a liquid by allowing them to settle under quiescent conditions.
An ideal sedimentation process can be created in the laboratory in the form of a batch column. The column
is ﬁlled with the suspension (turbid river water, industrial wastewater, or sewage) and samples are taken over
time from sampling ports located at several depths along the column. The measure of sedimentation efﬁciency
will be solids concentrations (or fraction of solids removed), which will be measured as a function of
time and depth.
The data come from a quiescent batch settling test. At the beginning of the test, the concentration is
uniform over the depth of the test settling column. The mass of solids in the column initially is M =
3
C 0 ZA, where C 0 is the initial concentration (g/m ), Z is the water depth in the settling column (m), and
2
A is the cross-sectional area of the column (m ). This is shown in the left-hand panel of Figure 38.1.
After settling has progressed for time t, the concentration near the bottom of the column has increased
relative to the concentration at the top to give a solids concentration proﬁle that is a function of depth
at any time t. The mass of solids remaining above depth z is M = A∫C(z, t)dz. The total mass of solids
in the column is still M = C 0 ZA. This is shown in the right-hand panel of Figure 38.1.

323 324 325 326 327 328 329 330 331 332 333