Page 45 - MATLAB Recipes for Earth Sciences
P. 45

36                                                 3 Univariate Statistics


            lustrated by means of examples. The text and binary files used in the follow-
            ing chapters are on the CD that comes with this book. It is recommended to

            save the files in the personal working directory.


            3.3 Example of Empirical Distributions


            Let us describe the data contained in the fi le organicmatter_one.txt. This fi le
            contains the organic matter content (in weight percent, wt%) of lake sedi-
            ments. In order to load the data type

               corg = load('organicmatter_one.txt');

            The data file consists of 60 measurements that can be displayed by

               plot(corg,zeros(1,length(corg)),'o')

            This graph demonstrates some of the characteristics of the data. The organic
            carbon content of the samples range between 9 and 15 wt%. Most data clus-
            ter between 12 and 13 wt%. Values below 10 and above 14 are rare. While
            this kind of representation of the data has its advantages, univariate data are
            generally displayed as histograms:

               hist(corg)

            By default, the MATLAB function hist divides the range of the data into
            ten equal intervals or classes, counts the observation within each interval
            and displays the frequency distribution as bar plot. The midpoints of the
            default intervals v and the number of observations n per interval can be ac-
            cessed using

               [n,v] = hist(corg);
            The number of classes should be not lower than six and not higher than fi f-
            teen for practical purposes. In practice, the square root of the number of ob-
            servations, rounded to the nearest integer, is often used as number of classes.
            In our example, we use eight classes instead of the default ten classes.
               hist(corg,8)

            We can even define the midpoint values of the histogram classes. In
            this case, it is recommended to choose interval endpoints that avoid
            data points falling between two intervals. The maximum and minimum
   40   41   42   43   44   45   46   47   48   49   50