Page 42 - MATLAB Recipes for Earth Sciences
P. 42

3.2 Empirical Distributions                                      33

           is used to take the mean of asymmetric or log-normally distributed data,
           similar to the geometric mean, but they are both not robust to outliers. The

           harmonic mean is a better average when the numbers are defined in relation
           to some unit. The common example is averaging velocity. The harmonic
           mean is also used to calculate the mean of samples sizes.


           Measures of Dispersion


           Another important property of a distribution is the dispersion. Some of the
           parameters that can be used to quantify dispersion are illustrated in Figure
           3.3. The simplest way to describe the dispersion of a data set is the  range,
           which is the difference between the highest value and lowest in the data set
           given by






           Since range is defined by the two extreme data points, it is very susceptible
           to outliers. Hence, is is not a reliable measure of dispersion in most cases.
           Using the interquartile range of the data, i.e., the middle 50% of the data
           attempts to overcome this. A very useful measure for dispersion is the  stan-
           dard deviation.







           The standard deviation is the average deviation of each data point from
           the mean. The standard deviation of an empirical distribution is often used
           as an estimate for the population standard deviation σ. The formula of the
           population standard deviation uses N instead of N-1 in the denominator.
           The sample standard deviation s is computed with N-1 instead of N since it
           uses the sample mean instead of the unknown population mean. The sam-
           ple mean, however, is computed from the data x , which reduces the degrees
                                                     i
           of freedom by one. The  degrees of freedom are the number of values in a
           distribution that are free to be varied. Dividing the average deviation of
           the data from the mean by N would therefore underestimate the population
           standard deviation σ.
             The variance is the third important measure of dispersion. The variance
           is simply the square of the standard deviation.
   37   38   39   40   41   42   43   44   45   46   47