Page 149 - Statistics and Data Analysis in Geology
P. 149
Statistics and Data Analysis in Geology - Chapter 6
to examine as many facets of a problem as possible, and sort out, a posteriori, the
major factors. The methods discussed in this chapter can be a significant help.
Multiple Regression
The first topic we will consider in our final chapter is actually a familiar subject
under a new and more general guise. This is multiple regression, which includes
polynomial curve fitting (discussed in Chapter 4) and trend-surface analysis (dis-
cussed in Chapter 5). However, we will now remove the restrictions that limited
us to considerations of change as a function of temporal or spatial coordinates.
Any observed variable can be considered to be a function of any other variable
measured on the same samples. In Chapter 4 we considered changes in moisture
content that occurred with changes in depth in the sediment. We could equally
well have measured the montmorillonite content of the sediment in the core and
examined the changes in water content that may accompany changes in montmo-
rillonite percentage. In fact we could have measured several variables, perhaps
organic content, mean grain size, and bulk density, and we could have examined
the differences in water content associated with changes in each or all of these
variables. In a sense, variables may be considered as dimensions, and their values
as coordinates, so we can envision changes occurring “along” a dimension defined
by a variable such as mineral content. Casting variables as dimensions is nothing
new; we perform this every time we plot two variables against one another, because
we are substituting spatial scales in the plot for the original scales on which the
variables were measured. Such interchangeability is explicit in the references to “p-
dimensional space” which abound in the literature of multivariate analysis. Just as
trend surfaces are a generalization of curve-fitting procedures to two-dimensional
space, multiple regression is a further generalization to “many-dimensional” space.
We will not consider multiple regression in great detail because the theoretical
and computational essentials have been presented in earlier chapters. You will re-
call from Chapter 4 that polynomial regressions (having one independent variable)
can be represented in a model equation of the general form
The model states that the value of a dependent variable, yi, at a location i is
equal to a constant term plus the sum of a series of powers of an independent
variable, xli, also observed at location i, plus a random error that is unique for
location i. A least-squares solution to a linear equation of this type can be found
by solving a set of normal equations for the P coefficients. These can be expressed
in matrix form as
Sxy=S=b (6.2)
with a solution
b=S&Sxy
is
where SX~ a column matrix of the sums of cross products of y, with xl, xf, . . . ,
xy; SXX is a matrix of sums of squares and cross products of the XI, x:, . . . , x?
powers; and b estimates 8, the column matrix of unknown regression coefficients.
In Chapter 4, we found the entries in the various matrices by labeling rows and
columns and cross multiplying.
462