Page 149 - Statistics and Data Analysis in Geology
P. 149

Statistics and Data Analysis  in  Geology - Chapter 6

             to examine as many facets of  a problem as possible, and sort out, a posteriori, the
             major factors. The methods discussed in this chapter can be a significant help.


             Multiple Regression

             The first  topic we will consider in our final chapter is actually a familiar subject
             under a new and more general guise.  This is multiple regression, which includes
             polynomial curve fitting (discussed in Chapter 4) and trend-surface analysis (dis-
             cussed in Chapter  5).  However, we will now remove the restrictions  that limited
             us to considerations of  change as a function of  temporal or spatial coordinates.
             Any observed variable can be  considered to be  a function of  any other variable
             measured on the same samples. In Chapter 4 we considered changes in moisture
             content  that occurred with changes in depth in the sediment.  We  could equally
             well have measured the montmorillonite  content of  the sediment in the core and
             examined the changes in water content that may accompany changes in montmo-
             rillonite  percentage.  In fact we  could have measured  several variables, perhaps
             organic content, mean grain size, and bulk density, and we could have examined
             the differences in water  content  associated with changes in each or all of  these
             variables. In a sense, variables may be considered as dimensions, and their values
             as coordinates, so we can envision changes occurring “along” a dimension defined
             by a variable such as mineral content.  Casting variables as dimensions is nothing
             new; we perform this every time we plot two variables against one another, because
             we are substituting spatial scales in the plot for the original scales on which the
             variables were measured. Such interchangeability is explicit in the references to “p-
             dimensional space” which abound in the literature of multivariate analysis. Just as
             trend surfaces are a generalization of  curve-fitting procedures to two-dimensional
             space, multiple regression is a further generalization to “many-dimensional” space.
                 We will not consider multiple regression in great detail because the theoretical
             and computational essentials have been presented in earlier chapters. You will re-
             call from Chapter 4 that polynomial regressions (having one independent variable)
             can be represented in a model equation of  the general form




                 The model states that the value of  a dependent variable, yi, at a location i is
             equal to a constant term plus the sum of a series of  powers of  an independent
             variable, xli, also observed at location i, plus a random error that is unique for
             location i.  A least-squares solution to a linear equation of  this type can be found
             by solving a set of normal equations for the P coefficients. These can be expressed
             in matrix form as
                                              Sxy=S=b                                (6.2)
             with a solution
                                              b=S&Sxy
                        is
             where SX~ a column matrix of  the sums of cross products of y, with xl, xf, . . . ,
             xy; SXX is a matrix of sums of  squares and cross products of  the XI, x:, . . . , x?
             powers; and b estimates 8, the column matrix of unknown regression coefficients.
             In Chapter 4, we found the entries in the various matrices by labeling rows and
             columns and cross multiplying.


             462
   144   145   146   147   148   149   150   151   152   153   154