Page 73 - MATLAB Recipes for Earth Sciences
P. 73

4.2 Pearson·s Correlation Coeffi cient                            65

           variables, meters and age. This trend can be described by Pearson·s cor-
           relation coeffi cient r, where r=1 stands for a perfect positive correlation, i.e.,
           age increases with meters, r=0 suggests no correlation, and r=-1 indicates
           a perfect negative correlation. We use the function   corrcoef to compute
           Pearson·s correlation coeffi cient.
             corrcoef(meters,age)

           which causes the output
             ans =
                 1.0000    0.9342
                 0.9342    1.0000
           The function  corrcoef calculates a matrix of correlation coeffi cients
           for all possible combinations of the two variables.  The combinations
           (meters, age) and  (age, meters) result in  r=0.9342, whereas
           (age, age) and (meters, meters) yield r=1.000.
             The value of r=0.9342 suggests that the two variables age and meters

           depend on each other. However, Pearson·s correlation coefficient is highly
           sensitive to outliers. This can be illustrated by the following example. Let us
           generate a normally-distributed cluster of thirty (x,y) data with zero mean
           and standard deviation one. In order to obtain identical data values, we reset
           the random number generator by using the integer 5 as seed.

               randn('seed',5);
             x = randn(30,1); y = randn(30,1);
             plot(x,y,'o'), axis([-1 20 -1 20]);

           As expected, the correlation coefficient of these random data is very low.

             corrcoef(x,y)
             ans =
                 1.0000    0.1021
                 0.1021    1.0000
           Now we introduce one single outlier to the data set, an exceptionally high
           (x,y) value, which is located precisely on the one-by-one line. The correla-

           tion coefficient for the bivariate data set including the outlier (x,y)=(5,5)
           is considerably higher than before.

             x(31,1) = 5; y(31,1) = 5;
             plot(x,y,'o'), axis([-1 20 -1 20]);
             corrcoef(x,y)
   68   69   70   71   72   73   74   75   76   77   78