Page 346 - Statistics for Environmental Engineers

P. 346

L1592_frame_C40 Page 356 Tuesday, December 18, 2001 3:24 PM

7.0

6.5
pH
6.0

5.5
0 100 200 300 400 500 600 700
Weak Acidity (µg/L)

FIGURE 40.1 The relation of pH and weak acidity data of Cosby Creek after three storms.

Begin by considering data from a single category. The quantitative predictor variable is x 1 which can
predict the independent variable y 1 using the linear model:

y 1i = β 0 + β 1 x 1i + e i

where β 0 and β 1 are parameters to be estimated by least squares.
If there are data from two categories (e.g., data produced at two different laboratories), one approach
would be to model the two sets of data separately as:
y 1i = α 0 + α 1 x 1i + e i

and
y 2i = β 0 + β 1 x 2i + e i

and then to compare the estimated intercepts (α 0 and β 0 ) and the estimated slopes (α 1 and β 1 ) using
conﬁdence intervals or t-tests.
A second, and often better, method is to simultaneously ﬁt a single augmented model to all the data.
To construct this model, deﬁne a categorical variable Z as follows:

Z = 0 if the data are in the ﬁrst category
Z = 1 if the data are in the second category

The augmented model is:
y i = α 0 + α 1 x i + Z β 0 + β 1 x i ) + e i
(
With some rearrangement:

y i = α 0 + β 0 Z + α 1 x i + β 1 Zx i + e i

In this last form the regression is done as though there are three independent variables, x, Z, and Zx.
The vectors of Z and Zx have to be created from the categorical variables deﬁned above. The four
parameters α 0 , β 0 , α 1 , and β 1 are estimated by linear regression.
A model for each category can be obtained by substituting the deﬁned values. For the ﬁrst category,
Z = 0 and:

341 342 343 344 345 346 347 348 349 350 351