Page 75 - MATLAB Recipes for Earth Sciences
P. 75

4.2 Pearson·s Correlation Coeffi cient                            67

             rhos1000 = bootstrp(1000,'corrcoef',x,y);


           This command first resamples the data a thousand times, calculates the

           correlation coefficient for each new subsample and stores the result in the
           variable rhos1000. Since corrcoef delivers a 2x2 matrix as mentioned
           above, rhos1000 has the dimension 1000x4,  i.e., 1000 values for each
           element of the 2x2 matrix. Plotting the histogram of the 1000 values of
           the second element, i.e., the correlation coeffi cient of (x,y) illustrates the
           dispersion of this parameter with respect to the presence or absence of the
           outlier. Since the distribution of rhos1000 contains a lot of empty classes,
           we use a large number of bins.

             hist(rhos1000(:,2),30)
           The histogram shows a cluster of correlation coeffi cients around r=0.2 that
           follow the normal distribution and a strong peak close to r=1 (Fig. 4.3).
           The interpretation of this histogram is relatively straightforward. As soon

           as the subsample contains the outlier, the correlation coefficient is close to
           one. Samples without the outlier yield a very low (close to zero) correla-
           tion coefficient suggesting no strong dependence between the two vari-

           ables x and y.




                                 Histogram of Bootstrap Results
               350
               300
                                             High corrrelation coefficients
              Bootstrap Samples  200  Low corrrelation coefficients  the outlier
                                                   of samples including
               250

                       of samples not containing
               150
                                  the outlier
               100
                50
                 0
                 −0.5                0                0.5                1
                                      Correlation Coefficient r

           Fig. 4.3 Bootstrap result for Pearson·s correlation coeffi cient r from 1000 subsamples. The
           histogram shows a roughly normally-distributed cluster of correlation coefficients at around

           r=0.2 suggesting that these subsamples do not contain the outlier. The strong peak close to
           r=1, however, suggests that such an outlier with high values of the two variables x and y is
           present in the corresponding subsamples.
   70   71   72   73   74   75   76   77   78   79   80