Page 166 - Statistics and Data Analysis in Geology
P. 166
Analysis of Multivariate Data
be relatively easy. As an exercise, it may be instructive to calculate the significance
of the discriminant function for the example we have just worked.
Not all of the variables we have included in the discriminant function will be
equally useful in distinguishing one group from another. We may wish to iso-
late those variables that are not especially helpful and eliminate them from future
analyses. Selecting the most effective set of discriminators for discriminant func-
tion analysis would seem to be analogous to selecting the most efficient predictors
in multiple regression. The problem, however, is more complicated because the
“dependent” or predicted variable in a discriminant function is composed of dif-
ferences between two sets of the same variables that are used as “independent”
predictors of the discrimination. Unlike regression, where the sums of squares
of y do not change as different variables Xj are added to the equation, the sums
of squares of the differences between groups A and B do change as variables are
added or deleted.
Some idea of the effectiveness of the variables as discriminators can be gained
by computing the standardized differences,
(6.28)
This is simply the difference between the means of the two groups A and B for
variable j, divided by the pooled standard deviation of variable j. Since the mea-
sure does not consider interactions between variables, it is useful only as a general
guide to discriminating power. Stepwise discriminant analysis programs may use
standardized differences in choosing the order in which variables are added to the
discriminant function. Marascuilo and Levin (1983) discuss “after-the-fact” con-
trast procedures that can be used to select the most important variables. However,
the significance of different combinations of variables can be tested only by com-
puting the various functions and determining the relative amounts of separation
the different equations produce between the two groups. To avoid bias, such tests
should be run on independent random samples.
Discriminant function analysis provides a natural transition between two major
classes of multivariate statistical techniques. On one hand, it is closely related to
multiple regression and trend-surface analysis. On the other, it can be expressed
as an eigenvalue problem, related to principal component analysis, factor analysis,
and similar multivariate methods. There are advantages to the use of eigenvectors
in calculating the discriminant function, because they allow us to simultaneously
discriminate between more than two groups. However, we will delay a consideration
of this topic until we examine the basic elements of eigenvector analysis and some
of the simpler eigenvector techniques.
Multivariate Extensions of Elementary Statistics
In Chapter 2, we considered some simple geologic problems that could be examined
by elementary statistical methods. We will begin our consideration of multivariate
methods in geology with some direct extensions of these simple tests. You will
recall that the variation measured in most naturally occurring phenomena could be
described by the normal distribution. This is a reflection of the central limit theo-
rem, which states that observations which are the sums of many independently op-
erating processes tend to be normally distributed as the number of effects becomes
479