Page 42 - MATLAB Recipes for Earth Sciences

P. 42

3.2 Empirical Distributions 33

is used to take the mean of asymmetric or log-normally distributed data,
similar to the geometric mean, but they are both not robust to outliers. The

harmonic mean is a better average when the numbers are deﬁned in relation
to some unit. The common example is averaging velocity. The harmonic
mean is also used to calculate the mean of samples sizes.

Measures of Dispersion

Another important property of a distribution is the dispersion. Some of the
parameters that can be used to quantify dispersion are illustrated in Figure
3.3. The simplest way to describe the dispersion of a data set is the range,
which is the difference between the highest value and lowest in the data set
given by

Since range is deﬁned by the two extreme data points, it is very susceptible
to outliers. Hence, is is not a reliable measure of dispersion in most cases.
Using the interquartile range of the data, i.e., the middle 50% of the data
attempts to overcome this. A very useful measure for dispersion is the stan-
dard deviation.

The standard deviation is the average deviation of each data point from
the mean. The standard deviation of an empirical distribution is often used
as an estimate for the population standard deviation σ. The formula of the
population standard deviation uses N instead of N-1 in the denominator.
The sample standard deviation s is computed with N-1 instead of N since it
uses the sample mean instead of the unknown population mean. The sam-
ple mean, however, is computed from the data x , which reduces the degrees
i
of freedom by one. The degrees of freedom are the number of values in a
distribution that are free to be varied. Dividing the average deviation of
the data from the mean by N would therefore underestimate the population
standard deviation σ.
The variance is the third important measure of dispersion. The variance
is simply the square of the standard deviation.

37 38 39 40 41 42 43 44 45 46 47