Page 166 - Statistics and Data Analysis in Geology
P. 166

Analysis of  Multivariate Data

             be relatively easy. As an exercise, it may be instructive to calculate the significance
             of the discriminant function for the example we have just worked.
                 Not all of  the variables we have included in the discriminant function will be
             equally useful in distinguishing one group from another.  We may wish  to iso-
             late those variables that are not especially helpful and eliminate them from future
             analyses. Selecting the most effective set of  discriminators for discriminant func-
             tion analysis would seem to be analogous to selecting the most efficient predictors
             in multiple regression.  The problem, however, is more complicated because the
             “dependent”  or predicted variable in a discriminant function is composed of  dif-
             ferences between two sets of  the same variables that are used as “independent”
             predictors  of  the  discrimination.  Unlike regression, where the sums of  squares
             of y do not change as different variables Xj are added to the equation, the sums
             of  squares of  the differences between groups A and B  do change as variables are
             added or deleted.
                 Some idea of  the effectiveness of  the variables as discriminators can be gained
             by computing the standardized differences,


                                                                                   (6.28)


             This is simply the difference between the means of  the two groups A  and B  for
             variable j, divided by the pooled standard deviation of  variable j. Since the mea-
             sure does not consider interactions between variables, it is useful only as a general
             guide to discriminating power.  Stepwise discriminant analysis programs may use
             standardized differences  in choosing the order in which variables are added to the
             discriminant function.  Marascuilo and Levin (1983) discuss “after-the-fact” con-
             trast procedures that can be used to select the most important variables. However,
             the significance of  different combinations of  variables can be tested only by com-
             puting the various functions and determining the relative amounts of  separation
             the different equations produce between the two groups. To avoid bias, such tests
             should be run on independent random samples.
                 Discriminant function analysis provides a natural transition between two major
             classes of  multivariate statistical techniques.  On one hand, it is closely related to
             multiple regression and trend-surface analysis. On the other, it can be expressed
             as an eigenvalue problem, related to principal component analysis, factor analysis,
             and similar multivariate methods. There are advantages to the use of eigenvectors
             in calculating the discriminant function, because they allow us to simultaneously
             discriminate between more than two groups. However, we will delay a consideration
             of  this topic until we examine the basic elements of  eigenvector analysis and some
             of  the simpler eigenvector techniques.

              Multivariate Extensions of Elementary Statistics

             In Chapter 2, we considered some simple geologic problems that could be examined
             by elementary statistical methods. We will begin our consideration of multivariate
             methods in geology with some direct extensions of  these simple tests.  You  will
             recall that the variation measured in most naturally occurring phenomena could be
             described by the normal distribution. This is a reflection of  the central limit theo-
             rem, which states that observations which are the sums of many independently op-
             erating processes tend to be normally distributed as the number of effects becomes

                                                                                      479
   161   162   163   164   165   166   167   168   169   170   171