Page 90 - Statistics and Data Analysis in Geology
P. 90
Statistics and Data Analysis in Geology - Chapter 4
Can predictions or estimations be made from the data? Can variables be related
or their effectiveness measured? Although such questions may not be explicitly
posed in each of the following discussions, you should examine the nature of the
methods and think about their applicability and the type of problems they may
help solve. The sample problems are only suggestions from the many that could
be used.
Geologists are concerned not only with the analysis of data in sequences, but
also with the comparison of two or more sequences. An obvious example is strati-
graphic correlation, either of measured sections or petrophysical well logs. A ge-
ologist's motive for numerical correlation may be a simple desire for speed, as in
the production of geologic cross-sections from digitized logs stored in data banks.
Alternatively, he may be faced with a correlation problem where the recognition of
equivalency is beyond his ability. Subtle degrees of similarity, too slight for unaided
detection, may provide the clues that will allow him to make a decision where none
is otherwise possible. Numerical methods allow the geologist to consider many
variables simultaneously, a powerful extension of his pattern-recognition facilities.
Finally, because of the absolute invariance in operation of a computer program,
mathematical correlation provides a challenge to the human interpreter. If a geol-
ogist's correlation disagrees with that established by computer, it is the geologist's
responsibility to determine the reason for the discrepancy. The forced scrutiny may
reveal complexities or biases not apparent during the initial examination. This is
not to say that the geologist should unthinkingly bend his interpretation to con-
form with that of the computer. However, because modern programs for automatic
correlation are increasingly able to mimic (and extend) the mental processes of a
human interpreter, their output must be considered seriously.
Most techniques for comparing two or more sequences can be grouped into two
broad categories. In the first of these, the data sequences are assumed to match at
one position only, and we wish to determine the degree of similarity between the
two sequences. An example is the comparison of an X-ray diffraction chart with
a set of standards in an attempt to identify an unknown mineral. The chart and
standards can be compared only in one position, where intensities at certain angles
are compared to intensities of the standards at the same angles. Nothing is gained,
for example, by comparing X-ray intensity at 20'28 with the intensity at 30'28 on
another chart. Although the correspondence may be high, it is meaningless.
The fact that data such as these are in the form of sequences is irrelevant,
because each data point is considered to be a separate and distinct variable. The
intensity of diffracted radiation at 20'28 is one variable, and the intensity at 30"28
is another. We will consider methods for the comparisons of such sequences in
greater detail in Chapter 6, when we discuss multivariate measures of similarity
and problems of classification and discrimination. In this class of problems, an ob-
servation's location in a sequence merely serves to identify it as a specific variable,
and its location has no other significance.
In contrast, some of the techniques we will discuss in this chapter regard data
sequences as samples from a continuous string of possible observations. There
is no a pn'ori reason why one position of comparison should be better than any
other. These methods of cross comparison superficially resemble the mental pro-
cess of geologic correlation, but have the limitation that they assume the distance
or time scales of the two sequences being compared are the same. In historic time
series and sequences such as Holocene ice cores, this assumption is valid. In other
162