Page 49 - MATLAB Recipes for Earth Sciences
P. 49

40                                                 3 Univariate Statistics

            NaN·s result in NaN·s, whereas the function nanmean simply skips the miss-
            ing value and computes the mean of the remaining data. As a second ex-
            ample, we now explore a data set characterized by a signifi cant skew. The
            data represent 120 microprobe analyses on glass shards hand-picked from a
            volcanic ash. The volcanic glass has been affected by chemical weathering
            in an initial stage. Therefore, the glass shards show glass hydration and sodi-
            um depletion in some sectors. We study the distribution of sodium contents
            (in  wt%) in the 120 measurements using the same principle as above.
               sodium = load('sodiumcontent.txt');


            As a first step, it is always recommended to visualize the data as a histo-
            gram. The square root of 120 suggests 11 classes, therefore we display the
            data by typing

               hist(sodium,11)
               [n,v] = hist(sodium,11);

            Since the distribution has a negative skew, the mean, median and mode are
            signifi cantly different.

               mean(sodium)

               ans =
                   5.6628
               median(sodium)
               ans =
                   5.9741
               v(find(n == max(n)))

               ans =
                   6.5407
            The mean of the data is lower than the median, which is in turn lower than
            the mode. We observe a strong negative skew as expected from our data.

               skewness(sodium)
               ans =
                   -1.1086

            Now we introduce a significant outlier to the data and explore its impact on

            the statistics of the sodium contents. We used a different data set contained
            in the fi le sodiumcontent_two.txt, which is better suited for this example
            than the previous data set. The new data set contains higher sodium values
   44   45   46   47   48   49   50   51   52   53   54