Page 165 - Statistics and Data Analysis in Geology
P. 165

Statistics and Data Analysis in  Geology - Chapter 6

                  The Euclidean distance and its square, unfortunately, are expressed as hodge-
              podges of  the original units of  measurement.  To be interpretable, they must be
              standardized.  Comparison  with  Equation (6.20) suggests  that  standardization
              must involve division by the multivariate equivalent of  the variance, which is the
             variance-covariance matrix S.  Of course, division is not a defined operation in ma-
              trix algebra, but we can accomplish the same end by multiplying by the inverse.
              Multiplying Equation (6.24) by the inverse of  the variance-covariance matrix yields
              the standardized squared distance,

                                             D2 = D‘ S-l D                          (6.25)

              This standardized measure of  difference between the means of  two multivariate
              groups is called Mahalanobis’ distance.  Substituting quantities from Table  6-5
              into Equation (6.25), we obtain

                                                59,098.305  4311.640     -0.010
                      D2 = [-0*010  -0.0431  [  4311.640     747.0581 [ -0.0431
                         = 11.172

                  Interestingly, we can obtain exactly the same distance measure by substituting
              the vector of mean differences into the discriminant function equation itself

                                                           - 783.442
                                  D2 = [ -0.010  -0.0431  [ -75.602  1
                                     = 11.172

              Mahalanobis’ distance can be visualized  on Figure  6-3,  where it is equal to the
              distance between RA and RB.
                  The significance of  Mahalanobis’ distance can be tested using a multivariate
              equivalent of  the t-test of  the equality of  two means, called Hotelling’s T2 test. We
              will discuss this test more extensively in the next section.  Here, we simply note
              that it has the form
                                            T2 =   nanb  D2                         (6.26)
                                                 na + nb
              and can be transformed to an F-test.  The test of  multivariate equality, using this
              more familiar statistic, is

                                       n, + nb - m - 1
                                 F=(                  ) (  nanb  ) D2               (6.27)
                                       (na + nb - 2) m    na + nb
              with m and  (na + nb  - m - 1) degrees of  freedom.  The null hypothesis  tested
              by this statistic is that the two multivariate means are equal, or that the distance
              between them is zero. That is,
                                             Ho:  D=O
              against
                                             Hi:  D>O
                  The appropriateness of  this as a test  of  a discriminant  function should be
              apparent. If the means of the two groups are very close together, it will be difficult to
              tell them apart, especially if both groups have large variances. In contrast, if the two
              means are well separated and scatter about the means is small, discrimination will


              478
   160   161   162   163   164   165   166   167   168   169   170