Page 218 - MATLAB Recipes for Earth Sciences
P. 218

214                                              9 Multivariate Statistics

            data set can easily be explored by visual inspection of a 2D histogram or an
            xy plot, the graphical display of a three variable data set requires a projection
            of the 3D distribution of data points into 2D. It is impossible to imagine or
            display a higher number of variables. One solution to the problem of visu-
            alization of high-dimensional data sets is the  reduction of dimensionality. A
            number of methods group highly-correlated variables contained in the data
            set and then explore a small number of groups.
               The classic methods to reduce dimensionality are the principal compo-
            nent analysis (PCA) and the factor analysis (FA). These methods seek the
            directions of maximum variance in the data set and use these as new coordi-
            nate axes. The advantage of replacing the variables by new groups of vari-
            ables is that the groups are uncorrelated. Moreover, these groups often help
            to interpret the multivariate data set since they often contain valuable infor-
            mation on process itself that generated the distribution of data points. In a

            geochemical analysis of magmatic rocks, the groups defined by the method
            usually contain chemical elements with similar ion size that are observed in
            similar locations in the lattice of certain minerals. Examples for such behav-
                                     2+
                    4+
                            3+
                                             2+
            ior are Si  and Al , and Fe  and Mg  in silicates, respectively.
               The second important suite of multivariate methods aim to group ob-
            jects by their similarity. As an example,  cluster analysis (CA) is often
            applied to correlate volcanic ashes as described in the above example.
            Tephrochronology tries to correlate tephra by means of their geochemical

            fingerprint. In combination with a few radiometric age determinations of
            the key ashes, this method allows to correlate sedimentary sequences that
            contain these ashes (e.g., Westgate 1998, Hermanns et al. 2000). More
            examples for the application of cluster analysis come from the fi eld of
            micropaleontology. In this context, multivariate methods are employed to
            compare microfossil assemblages such as pollen, foraminifera or diatoms
            (e.g., Birks and Gordon 1985).
               The following text introduces the most important techniques of multivari-
            ate statistics, principal component analysis and cluster analysis (Chapter 9.2
            and 9.3). A nonlinear extension of the PCA is the  independent component
            analysis (ICA) (Chapter 9.4). Firstly, the chapters provide an introduction to
            the theory behind the techniques. Subsequently, the use of these methods in
            analyzing earth sciences data is illustrated with MATLAB functions.



            9.2 Principal Component Analysis

            The  principal component analysis (PCA) detects linear dependencies be-
   213   214   215   216   217   218   219   220   221   222   223