Page 157 - Statistics and Data Analysis in Geology
P. 157
Statistics and Data Analysis in Geology - Chapter 6
of one of the techniques. These methods are well described in some of the texts
listed in the Selected Readings at the end of the chapter, especially in Marascuilo
and Levin (1983) and in Draper and Smith (1998).
The backward elimination procedure consists of computing a regression in-
cluding all possible variables and selecting the least significant variable. The selec-
tion proceeds by examining the standardized partial regression coefficients for the
smallest value and then recomputing the regression, omitting that variable. The
significance of the deleted variable is tested by the analysis of variance shown in
Table 6-3. If the variable is not making a significant contribution to the regres-
sion, it is permanently discarded. The reduced regression model is then fitted to
the data, a new set of standardized partial regression coefficients for the reduced
equation is calculated, and the process is repeated. At each step, the regression
equation is reduced by one variable, until all remaining variables are significant.
It is instructive to examine the collection of six independent variables mea-
sured on river basins (file KENTUCKY.TXT) and see if any can be discarded without
significantly affecting the multiple regression on basin magnitude. We can find a
minimal set of regressions by examining the standardized partial regression coeffi-
cients, deleting the smallest of these, and recomputing the regression. Repeatedly
running a multiple-regression program obviously is less efficient than using a step-
wise computer program, but it has the advantage that every step in the process can
be examined closely. When you are confident that you understand the elimination
process and the changes that occur in the regression coefficients, you may turn to
a more automated procedure.
Although multiple regression is “multivariate” in the sense that more than one
variable is measured on each observational unit, it really is a univariate technique
because we are concerned only with the variance of one variable, y. Behavior of
the independent variables, the x’s, is not subject to analysis.
The next topic we will consider is discriminant function analysis, which in-
volves identification or the placing of objects into predefined groups. The discrim-
ination between two alternative groups is a process that is computationally inter-
mediate between univariate procedures and true multivariate methods in which
many variables are considered simultaneously. Two groups, each characterized by
a set of multiple variables, can be discriminated by solving a set of simultaneous
equations almost identical to those involved in multiple regression. The right-hand
vector of the matrix equation, however, does not contain cross products between
independent variables and a single dependent variable, but rather differences be-
tween the multivariate means of the two groups that are to be discriminated.
Tests of discriminant functions involve multivariate extensions of simple uni-
variate statistical tests of equality. These will be considered next, followed by a dis-
cussion of multivariate classification, or the sorting of objects into homogeneous
groups. We will then consider eigenvector techniques, including principal compo-
nent and factor analysis. The final topics will include multivariate extensions of
discriminant analysis and multiple regression.
This list of topics is certainly not all-inclusive. However, the subjects have been
chosen because they have found special utility in the Earth sciences. They include a
wide variety of computational techniques and encompass many fundamental con-
cepts. An understanding of the theory and operational procedures involved in
these methods should provide you with a sufficient background to evaluate other
multivariate techniques as well.
470