Page 76 - MATLAB Recipes for Earth Sciences
P. 76

68                                                  4 Bivariate Statistics

               Bootstrapping therefore represents a powerful and simple tool for accept-

            ing or rejecting our first estimate of the correlation coeffi cient. The applica-
            tion of the above procedure applied to the synthetic sediment data yields a
            clear unimodal gaussian distribution of the correlation coeffi cients.

               corrcoef(meters,age)
               ans =
                   1.0000    0.9342
                   0.9342    1.0000

               rhos1000 = bootstrp(1000,'corrcoef',meters,age);
               hist(rhos1000(:,2),30)

            Most rhos1000 fall within the interval between 0.88 and 0.98. Since the

            resampled correlation coefficients obviously are gaussian distributed, we
            can use the mean as a good estimate for the true correlation coeffi cient.

               mean(rhos1000(:,2))
               ans =
                   0.9315


            This value is not much different to our first result of r=0.9342. However,
            now we can be certain about the validity of this result. However, in our
            example, the bootstrap estimate of the correlations from the age-depth data
            is quite skewed, as there is a hard upper limit of one. Nevertheless, the boot-
            strap method is a valuable tool for obtaining valuable information on the
            reliability of Pearson·s correlation coefficient of bivariate data sets.



            4.3 Classical Linear Regression Analysis and Prediction


             Linear regression provides another way of describing the dependence be-
            tween the two variables x and y. Whereas Pearson·s correlation coeffi cient
            only provides a rough measure of a linear trend, linear models obtained by
            regression analysis allow to predict arbitrary y values for any given value
            of x within the data range. Statistical testing of the significance of the linear

            model provides some insights into the quality of prediction.
               Classical regression assumes that y responds to x, and the entire disper-
            sion in the data set is in the y-value (Fig. 4.4). Then, x is the  independent

            or  regressor or  predictor variable. The values of x is defined by the experi-
            mentalist and are often regarded as to be free of errors. An example is the
            location x of a sample in a sediment core. The  dependent variable y contains
   71   72   73   74   75   76   77   78   79   80   81