Page 55 - Petrology of Sedimentary Rocks
P. 55
A FEW STATISTICAL MEASURES FOR USE IN SEDIMENTARY PETROLOGY
Introduction. A good knowledge of statistics is becoming essential for anyone who
wishes to work in any of the sciences, because the whole of scientific work from laying
out the experiment to interpretation of data is based on statistics. Trying to use
numerical data without a knowledge of statistics is like trying to drive without a brake.
You never know where you will end up and the odds are you will end up in the wrong
place and get the wrong conclusion.
In sedimentary petrography, statistics are used in laying out the sampling
program; in determining the best experimental technique for analysis; in collecting the
analytical data; and in drawing correct geological conclusions, such as: what is the
content of feldspar in X formation? Within what limits am I certain this value is
correct? What is the spread of values to be expected? Does X formation have more
feldspar than Y formation, and how confident am I of this? Does its heavy mineral
content differ significantly from that of formation Y? What is the relation between
grain size and zircon content, expressed mathematically?
This outline is not intended to make you an expert in statistics. It merely shows
examples of the use of statistics in petrography, with the hope that it will stimulate you
to take several courses or read up on your own. It is super-simplified and condensed,
therefore omits a lot of material that should really be covered. For further information
refer to any standard textbook, especially for geologists: Miller and Kahn, 1962;
Krumbein and Graybill, 1965: Griffiths, 1967; Koch & Link, 1972; Davis, 1973.
The Normal Probability Curve. In order to understand some of the assumption and
underlying principles, it is essential to study the statisticians’ most fundamental
concept, that of the normal probability curve. This is the basis for study of
experimental data of all kinds.
As a first step in the analysis of data from any field of science, one usually
constructs a frequency distribution. For example, if one is studying the batting
averages of baseball players, he would select convenient class intervals to divide the
entire range of data into about IO to 20 classes and proceed to find how many batters
had averages between .200 and .2 IO, how many between .2 IO and .220, and so on; here
the class interval would be .OlO. Or if an anthropologist were studying the lengths of
human thigh bones, he would first ascertain the spread between the largest and smallest
bone (say for example 12” to 3l”), and divide this into a convenient number of classes.
Here a convenient class interval would be I”, and he would proceed to find how many
thighbones were between 12” and I3”, how many between 13” and l4”, and so on. When
data of this type is plotted up in the form of a histogram or frequency curve, it is
usually found that most of the items are clustered around the central part of the
distribution with a rapid “tailing off” in the extremes. For example, far more baseball
players hit between .260 and .280 than hit between .320 and .340, or between . I80 and
.200. Even less hit between . IO0 and .I 20, or between .380 and .400. A great many
types of data follow this distribution, and the type of frequency curve resulting is called
the “probability curve,” or the “normal curve,” or often a “Gaussian curve” after Gauss
who was a pioneer in the field. The curve is defined as the kind of distribution resulting
if one had 100 well-balanced coins and tossed them all repeatedly to count the number
of heads appearing. Naturally, the most frequent occurrence would be 50 heads and 50
49