Page 158 - Statistics and Data Analysis in Geology
P. 158

Analysis of Multivariate  Data

             Discri m i na nt Functions
             One of the most widely used multivariate procedures in Earth science is the dis-
             criminant function. We will consider it at length for two reasons: discrimination is
             a powerful statistical tool and it can be regarded as either a way to treat univariate
             problems related  to multiple regression, ‘or multivariate problems related to the
             statistical tests we will discuss later. Discriminant functions therefore provide an
             additional link between univariate and multivariate statistics.
                 First, however, we  must define the process of  discrimination,  and carefully
             distinguish it from the related process of classification. Suppose we have assembled
             two collections of  shale samples of  known freshwater and saltwater origin.  We
             may have determined their origin from an examination of  their fossil content.  A
             number of geochemical variables have been measured on each specimen, including
             the content of vanadium, boron, iron, and so forth. The problem is to find the linear
             combination of these variables that produces the maximum difference between the
             two previously defined groups.  If we find a function that produces a significant
             difference, we can use it to allocate new specimens of shale of unknown origin to
             one of  the two original groups. In other words, new shale samples, not containing
             diagnostic fossils, can then be categorized as marine or freshwater on the basis of
             the linear discriminant function of  their geochemical components.  [This problem
             was considered by Potter, Shimp, and Witters (1963).]
                  Classification can be illustrated with a similar example. Suppose we have ob-
             tained a large, heterogeneous collection of shale specimens, each of which has been
             geochemically analyzed. On the basis of  the measured variables, can the shales be
             separated into groups (or clusters, as they are commonly called) that are both rel-
             atively homogeneous and distinct from other groups?  The process by which this
             can be done has been highly developed by numerical taxonomists, and will be con-
             sidered in a later section. There are several obvious differences between these pro-
             cedures and those of  discriminant function analysis. A classification is internally
             based; that is, it does not depend on a priori knowledge about relations between
             observations as does a discriminant function. The number of  groups in a discrim-
             inant function is set prior to the analysis, while in contrast the number of clusters
             that will emerge from a classification scheme cannot ordinarily be predetermined.
             Similarly, each original observation is defined as belonging to a specific group in
             a discriminant analysis.  In most classification procedures, an observation is free
             to enter any cluster that emerges.  Other differences will become apparent as we
             examine these two procedures. The result of  a cluster analysis of shales would be
             a classification of  the observations into several groups. It would then be up to us
             to interpret the geological meaning (if any) of the groups so found.
                 A simple linear discriminant function transforms an original set of  measure-
             ments on a specimen into a single discriminant score. That score, or transformed
             variable, represents the specimen’s position along a line defined by the linear dis-
             criminant function. We  can therefore think of  the discriminant function as a way
             of  collapsing a multivariate problem down into a problem which involves only one
             variable.
                 Discriminant function analysis consists of finding a transform which gives the
             maximum ratio  of  the difference between  two group multivariate  means to the
             multivariate variance within the two groups. If we regard our two groups as form-
             ing clusters of  points in multivariate space, we must search for the one orienta-
             tion along which the two clusters have the greatest separation while each cluster

                                                                                      471
   153   154   155   156   157   158   159   160   161   162   163