Page 228 - MATLAB Recipes for Earth Sciences
P. 228

224                                              9 Multivariate Statistics

               data = load('sediments.txt');
               for i=1:10
                 sample(i,:) = ['sample',sprintf('%02.0f',i)];
               end
               clear i
               minerals= ['amp';'pyr';'pla';'ksp';'qtz';'cla';'flu';'sph';'gal'];

            Subsequently, the distances between pairs of samples can be computed. The
            function pdist provides many ways for computing this distance, such as
            the Euclidian or Manhattan distance. We use the default setting which is the
            Euclidian distance.

               Y = pdist(data);
            The function pdist returns a vector Y containing the distances between
            each pair of observations in the original data matrix. We can visualize the
            distances on another pseudocolor plot.
                 squareform(Y);
                 imagesc(squareform(Y)),colormap(hot)
               title('Euclidean distance between pairs of samples')
               xlabel('First Sample No.')
               ylabel('Second Sample No.')
               colorbar

            The function squareform converts Y into a symmetric, square format, so
            that the elements (i,j)of the matrix denote the distance between the i
            and j objects in the original data. Next we rank and link the samples with
            respect to their inverse distance using the function linkage.
               Z = linkage(Y);

            In this 3-column array Z, each row identifies a link. The first two columns


            identify the objects (or samples) that have been linked, the third column
            contains the individual distance between these two objects. The fi rst row
            (link) between objects (or samples) 1 and 2 has the smallest distance cor-
            responding to the highest similarity. Finally, we visualize the hierarchical
            clusters as a dendrogram which is shown in Figure 9.4.

                 dendrogram(Z);
               xlabel('Sample No.')
               ylabel('Distance')
               box on
            Clustering finds the same groups as the principal component analysis. We

            observe clear groups consisting of samples 1, 2, 8 to 10 (the magmatic
   223   224   225   226   227   228   229   230   231   232   233