Page 50 - MATLAB Recipes for Earth Sciences
P. 50

3.4 Theoretical Distributions                                    41

           of around 17 wt% and is stored in the fi le

             sodium = load('sodiumcontent_two.txt');
           This data set contains only 50 measurements in order to better illustrate the
           effect of an outlier. We can use the script used in the previous example to
           display the data in a histogram and compute the number of observations n
           with respect to the classes v. The mean of the data is 16.6379, the media is
           16.9739 and the mode is 17.2109. Now we introduce one single value of 1.5
           wt% in addition to the 50 measurements contained in the original data set.

             sodium(51,1) = 1.5;

           The histogram of this data set illustrates the distortion of the frequency dis-
           tribution by this single outlier. The corresponding histogram shows several
           empty classes. The influence of this outlier on the sample statistics is sub-

           stantial. Whereas the median of 16.9722 is relatively unaffected, the mode
           of 170558 is slightly different since the classes have changed. The most
           significant changes are observed in the mean (16.3411), which is very sensi-

           tive to outliers.




           3.4 Theoretical Distributions

           Now we have described the  empirical frequency distribution of our sample.
           A histogram is a convenient way to picture the  probability distribution of the
           variable x. If we sample the variable sufficiently often and the output ranges

           are narrow, we obtain a very smooth version of the histogram. An infi nite

           number of measurements N|’ and an infinite small class width produces
           the random variable·s probability density function (PDF). The probability
           distribution density f(x) defines the probability that the variate has the value

           equal to x. The integral of f(x) is normalized to unity, i.e., the total number
           of observations is one. The  cumulative distribution function (CDF) is the
           sum of a discrete PDF or the integral of a continuous PDF. The cumulative
           distribution function F(x) is the probability that the variable takes a value
           less than or equal x.

             As a next step, we have to find a suitable  theoretical distribution that

           fits the empirical distributions described in the previous chapters. In this
           section, the most frequent theoretical distributions are introduced and their
           application is described.
   45   46   47   48   49   50   51   52   53   54   55