Page 306 - Machine Learning for Subsurface Characterization
P. 306
268 Machine learning for subsurface characterization
will be voted twice in the model leading to the overemphasis of the importance
of the correlated features.
For purposes of explanation, NB classifier will be applied on training sam-
ples, such that each sample has three features (X 1 , X 2 , X 3 ) and a single label (y i ),
where i ¼ 1 or 2. Therefore, NB classifier needs to accomplish the binary clas-
sification task of assigning a single label y, either y 1 or y 2 , to a sample based on
its feature values. As the first goal, the algorithm processes the training dataset
to approximate the probability of a class y i for a given set of feature values (X 1 ,
X 2 , X 3 ) that is expressed as
ðÞ
ð
ð
ð
PX 1jy i ÞPX 2 jy i ÞPX 3jy i ÞPy i
Py i X 1 ,jð X 2 , X 3 Þ ¼ (9.5)
ð
ð
PX 1 ÞPX 2 ÞPX 3 Þ
ð
For a specific dataset, the denominator in Eq. (9.5) is constant. So, Eq. (9.5)
can be simplified to a proportionality expressed as
ð
ðÞ
ð
ð
Py i X 1 ,jð X 2 , X 3 Þ∝PX 1 jy i ÞPX 2 jy i ÞPX 3 jy i ÞPy i (9.6)
In Eq. (9.6), individual P(X j jy i ), where j ¼ 1, 2 or 3, can be calculated based
on the assumption of the distributions of the features. For discrete features, the
feature distribution is assumed to follow multinomial distribution, whereas, for
continuous-valued features, the feature distribution is assumed to follow Gauss-
ian distribution. To calculate the statistical parameters (such as mean and var-
iance) of the feature distributions, the dataset is first segmented by the class, and
then, the parameters are calculated for each class to enable the calculation of
P(X j jy i ). Finally, the algorithm estimates the probability of a given sample with
known feature values to belong to a certain class by picking the y i that leads to
the largest value of P(X 1 jy i )P(X 2 jy i )P(X 3 jy i )P(y i ). This statement is mathemat-
ically represented as
ð
ðÞ
ð
ð
y ¼ argmax PX 1jy i ÞPX 2 jy i ÞPX 3jy i ÞPy i (9.7)
y i
This is referred to as the maximum a posteriori decision rule; in other words,
pick the hypothesis that is most probable.
4.1.7 Artificial neural network (ANN) classifier
ANN is composed of consecutive layers, where each layer contains several
computational units in parallel (Fig. 9.19). Each computational unit is
referred as the neuron. The layer of ANN that reads the features is called
the input layer, while the layer of ANN that generates the final targets is
called the output layer. In our case, the input layer has 28 neurons to read
the 28 travel-time measurements for each sample. The output layer has either
four or eight dimensions based on the number of classes to be assigned. Any
layer between the input layer and output layer is called the hidden layer. The
output of the previous layer is taken as input for the next layer. In a densely
connected network, each neuron in a layer is connected to all the neurons in

