Page 74 - MATLAB Recipes for Earth Sciences
P. 74

66                                                  4 Bivariate Statistics

               ans =
                   1.0000    0.4641
                   0.4641    1.0000

            After increasing the absolute (x,y) values of this outlier, the correlation

            coefficient increases dramatically.
               x(31,1) = 10; y(31,1) = 10;

               plot(x,y,'o'), axis([-1 20 -1 20]);
               corrcoef(x,y)

               ans =
                   1.0000    0.7636
                   0.7636    1.0000
            and reaches a value close to  r=1 if the  outlier has a value of
            (x,y)=(20,20).

               x(31,1) = 20; y(31,1) = 20;

               plot(x,y,'o'), axis([-1 20 -1 20]);
               corrcoef(x,y)
               ans =
                   1.0000    0.9275
                   0.9275    1.0000
            Still, the bivariate data set does not provide much evidence for a strong
            dependence. However, the combination of the random bivariate (x,y) data
            with one single outlier results in a dramatic increase of the correlation coef-
            ficient. Whereas outliers are easy to identify in a bivariate scatter, erroneous

            values might be overlooked in large multivariate data sets.
               Various methods exist to calculate the significance of Pearson·s correla-

            tion coeffi cient. The function corrcoef provides the possibility for evalu-
            ating the quality of the result. Furthermore, resampling schemes or surro-
            gates such as the bootstrap or jackknife method provide an alternative way
            of assessing the  statistical  significance of the results. These methods repeat-

            edly resample the original data set with N data points either by choosing N-1
            subsamples N times (the jackknife) or picking an arbitrary set of subsamples
            with N data points with replacements (the bootstrap). The statistics of these
            subsamples provide a better information on the characteristics of the popu-
            lation than statistical parameters (mean, standard deviation, correlation co-
            effi cients) computed from the full data set. The function   bootstrp allows
            resampling of our bivariate data set including the outlier (x,y)=(20,20).
   69   70   71   72   73   74   75   76   77   78   79