Page 47 - Statistics and Data Analysis in Geology
P. 47

Statistics and Data Analysis in  Geology - Chapter 2

             Summary Statistics

             The most obvious measure of  a population or sample is some type of average value.
             Several measures exist, but only a few are used in practice. The mode is the value
             that occurs with the greatest frequency. In an asymmetric distribution such as that
             shown in Figure 2-16, the mode is the highest point on the frequency curve. The
              median is the value midway in the frequency distribution. In Figure 2-16, one-half
             of  the area below the distribution curve is to the right of  the median, one-half is to
             the left. The median is the 50th percentile, the 5th decile, or the 2nd quartile. The
              meun is another word for the arithmetic average, and is defined as the sum of  all
              observations divided by the number of  observations. The geometric meun is the
             nth root of  the products of  the n observations, or equivalently, the exponential of
             the arithmetic mean of the logarithms of the observations. In asymmetric frequency
              curves, the median lies between the mean and the mode. In symmetric curves such
              as the normal distribution, the mean, median, and mode coincide.





















              Figure 2-16.  Asymmetric  distribution  showing  relative  positions  of  mean,  median,  and
                   mode.

                  Certain symbols traditionally have been assigned to measures of  distribution
              curves. Generally, the symbols for population distributions are Greek letters, and
              those  for  sample distributions are  Roman.  The  sample mean,  for  example, is
              designated X and the population mean is p  (mu). A common objective in an in-
              vestigation is to estimate some parameter of  a population. A statistic we compute
              based on a sample taken from the population is used as an estimator of  the de-
              sired parameter.  The use of  Greek and Roman symbols serves to emphasize the
              difference between parameters and the equivalent statistics.
                  The sample mean has two highly desirable properties that make it more use-
              ful as an estimator of  the average or central value of  a population than either the
              sample median or mode.  First, the sample mean is an unbiased estimate of  the
              population mean.  A (sample) statistic is an unbiased estimate of  the equivalent
              (population) parameter if  the average value of  the statistic, from a large series of
              samples, is equal to the parameter. Second, it can be demonstrated that, for sym-
              metrical distributions such as the normal, the sample mean tends to be closer to the
              population mean than any other unbiased estimate (such as the median) based on
              the same sample. This is equivalent to saying that sample means are less variable

              34
   42   43   44   45   46   47   48   49   50   51   52