Page 306 - Machine Learning for Subsurface Characterization
P. 306

268 Machine learning for subsurface characterization


            will be voted twice in the model leading to the overemphasis of the importance
            of the correlated features.
               For purposes of explanation, NB classifier will be applied on training sam-
            ples, such that each sample has three features (X 1 , X 2 , X 3 ) and a single label (y i ),
            where i ¼ 1 or 2. Therefore, NB classifier needs to accomplish the binary clas-
            sification task of assigning a single label y, either y 1 or y 2 , to a sample based on
            its feature values. As the first goal, the algorithm processes the training dataset
            to approximate the probability of a class y i for a given set of feature values (X 1 ,
            X 2 , X 3 ) that is expressed as
                                                             ðÞ
                                                      ð
                                        ð
                                               ð
                                       PX 1jy i ÞPX 2 jy i ÞPX 3jy i ÞPy i
                        Py i X 1 ,jð  X 2 , X 3 Þ ¼                     (9.5)
                                                  ð
                                             ð
                                            PX 1 ÞPX 2 ÞPX 3 Þ
                                                       ð
               For a specific dataset, the denominator in Eq. (9.5) is constant. So, Eq. (9.5)
            can be simplified to a proportionality expressed as
                                                      ð
                                                             ðÞ
                                        ð
                                               ð
                        Py i X 1 ,jð  X 2 , X 3 Þ∝PX 1 jy i ÞPX 2 jy i ÞPX 3 jy i ÞPy i  (9.6)
               In Eq. (9.6), individual P(X j jy i ), where j ¼ 1, 2 or 3, can be calculated based
            on the assumption of the distributions of the features. For discrete features, the
            feature distribution is assumed to follow multinomial distribution, whereas, for
            continuous-valued features, the feature distribution is assumed to follow Gauss-
            ian distribution. To calculate the statistical parameters (such as mean and var-
            iance) of the feature distributions, the dataset is first segmented by the class, and
            then, the parameters are calculated for each class to enable the calculation of
            P(X j jy i ). Finally, the algorithm estimates the probability of a given sample with
            known feature values to belong to a certain class by picking the y i that leads to
            the largest value of P(X 1 jy i )P(X 2 jy i )P(X 3 jy i )P(y i ). This statement is mathemat-
            ically represented as
                                      ð
                                                          ðÞ
                                             ð
                                                    ð
                          y ¼ argmax PX 1jy i ÞPX 2 jy i ÞPX 3jy i ÞPy i  (9.7)
                                    y i
               This is referred to as the maximum a posteriori decision rule; in other words,
            pick the hypothesis that is most probable.
            4.1.7 Artificial neural network (ANN) classifier
            ANN is composed of consecutive layers, where each layer contains several
            computational units in parallel (Fig. 9.19). Each computational unit is
            referred as the neuron. The layer of ANN that reads the features is called
            the input layer, while the layer of ANN that generates the final targets is
            called the output layer. In our case, the input layer has 28 neurons to read
            the 28 travel-time measurements for each sample. The output layer has either
            four or eight dimensions based on the number of classes to be assigned. Any
            layer between the input layer and output layer is called the hidden layer. The
            output of the previous layer is taken as input for the next layer. In a densely
            connected network, each neuron in a layer is connected to all the neurons in
   301   302   303   304   305   306   307   308   309   310   311