Page 47 - MATLAB Recipes for Earth Sciences
P. 47

38                                                 3 Univariate Statistics

            alternative measure of central tendency.

                 median(corg)
               ans =
                   12.4712

            which is not much different in this example. However, we will see later that
            this difference can be signifi cant for distributions that are not symmetric in
            respect with the arithmetic mean. A more general parameter to defi ne frac-
            tions of the data less or equal to a certain value is the quantile. Some of the
            quantiles have special names, such as the three quartiles dividing the distri-
            bution into four equal parts, 0-25%, 25-50%, 50-75% and 75-100% of the
            total number of observations.
                 prctile(corg,[25 50 75])

               ans =
                   11.4054   12.4712   13.2965

            The third parameter in this context is the mode, which is the midpoint of the
            interval with the highest frequency. MATLAB does not provide a function
            to compute the mode. We use the function find to located the class that has
            the largest number of observations.
               v(find(n == max(n)))

               ans =
                   11.9500   12.6000   13.2500
            This statement simply identifies the largest element in n. The index of this

            element is then used to display the midpoint of the corresponding class v. In
            case there are several n·s with similar values, this statement returns several
            solutions suggesting that the distribution has several modes. The median,
            quartiles, maximum and minimum of a data set can be summarized and
            displayed in a box and whisker plot.

                 boxplot(corg)
            The boxes have lines at the lower quartile, median, and upper quartile val-
            ues. The whiskers are lines extending from each end of the boxes to show
            the extent of the rest of the data.
               The most popular measures for dispersion are range, standard deviation

            and variance. We have already used the range to define the midpoints of the
            classes. The variance is the average squared deviation of each number from
            the mean of a data set
   42   43   44   45   46   47   48   49   50   51   52