Page 346 - Statistics for Environmental Engineers
P. 346

L1592_frame_C40   Page 356  Tuesday, December 18, 2001  3:24 PM




                                             7.0


                                             6.5
                                           pH
                                             6.0


                                             5.5
                                               0   100  200  300  400  500  600  700
                                                          Weak Acidity (µg/L)

                       FIGURE 40.1 The relation of pH and weak acidity data of Cosby Creek after three storms.

                        Begin by considering data from a single category. The quantitative predictor variable is x 1  which can
                       predict the independent variable y 1  using the linear model:

                                                      y 1i =  β 0 + β 1 x 1i +  e i

                       where β 0  and β 1  are parameters to be estimated by least squares.
                        If there are data from two categories (e.g., data produced at two different laboratories), one approach
                       would be to model the two sets of data separately as:
                                                      y 1i =  α 0 +  α 1 x 1i +  e i


                       and
                                                      y 2i =  β 0 + β 1 x 2i +  e i

                       and then to compare the estimated intercepts (α 0  and β 0 ) and the estimated slopes (α 1  and β 1 ) using
                       confidence intervals or t-tests.
                        A second, and often better, method is to simultaneously fit a single augmented model to all the data.
                       To construct this model, define a categorical variable Z as follows:

                                          Z = 0  if the data are in the first category
                                          Z = 1  if the data are in the second category


                       The augmented model is:
                                                 y i =  α 0 +  α 1 x i +  Z β 0 +  β 1 x i ) +  e i
                                                                (
                       With some rearrangement:

                                                 y i =  α 0 +  β 0 Z + α 1 x i + β 1 Zx i +  e i

                       In this last form the regression is done as though there are three independent variables, x, Z, and Zx.
                       The vectors of  Z and  Zx have to be created from the categorical variables defined above. The four
                       parameters α 0 , β 0 , α 1 , and β 1  are estimated by linear regression.
                        A model for each category can be obtained by substituting the defined values. For the first category,
                       Z = 0 and:

                                                       y i =  α 0 +  α 1 x i +  e i
                       © 2002 By CRC Press LLC
   341   342   343   344   345   346   347   348   349   350   351