Page 19 - Machine Learning for Subsurface Characterization
P. 19

4   Machine learning for subsurface characterization


            driven model is used to predict targets/outcomes based on the features/attributes
            of new, unseen samples during the model deployment. In unsupervised learning
            (e.g., clustering and transformation), a data-driven model learns to generate an
            outcome based on the features/attributes of samples without any prior informa-
            tion about the outcomes. In reinforcement learning (which tends to be very chal-
            lenging), a data-driven model learns to perform a specific task by interacting
            with an environment to receive a reward based on the actions performed by
            the model toward accomplishing the task. In reinforcement learning, the model
            learns the policy for a specific task by optimizing the cumulative reward
            obtained from the environment. These three learning techniques have several
            day-to-day applications; for instance, supervised learning is commonly used
            in spam detection. The spam detection model is trained on different mails
            labeled as spam or not spam; after gaining the knowledge from the training data-
            set and subsequent evaluation on the testing dataset, the trained spam detection
            model can detect if a new mail is spam or not. Unsupervised learning is used in
            marketing where customers are categorized/segmented based on the similarity/
            dissimilarity of their purchasing trends as compared with other customers;
            for instance, Netflix’s computational engine uses the similarity/dissimilarity
            between what other users have watched when recommending the movies. Rein-
            forcement learning was used to train DeepMind’s AlphaGo to beat world cham-
            pions at the game of Go. Reinforcement learning was also used to train the chess
            playing engine, where the model was penalized for making moves that led to
            losing a piece and rewarded for moves that led to a checkmate.
               A machine learning method first processes the training dataset to build a
            data-driven model; following that, the performance of the newly developed
            model is evaluated against the testing dataset. After confirming the accuracy
            and precision of the data-driven model on the testing dataset, these methods
            are deployed on the new dataset. These three types of dataset, namely, training,
            testing, and new dataset, comprise measurements of certain specific features for
            numerous samples. The training and testing datasets, when used in supervised
            learning, contain additional measurements of the targets/outcomes. A super-
            vised learning technique tries to functionally relate the features to the targets
            for all the samples in the dataset. On the contrary, for unsupervised learning,
            the data-driven model development takes place without the targets; in other
            words, there are no targets to be considered during the training and testing
            stages of unsupervised learning. Obviously, information about the targets is
            never available in the new dataset because the trained models are deployed
            on the new dataset to compute the desired targets or certain outcomes.


            1.3 Types of outliers
            In the context of this work, outliers can be broadly categorized into three types:
            point/global, contextual, and collective outliers [1]. Point/global outliers refer to
            individual data points or samples that significantly deviate from the overall dis-
            tribution of the entire dataset or from the distribution of certain combination of
   14   15   16   17   18   19   20   21   22   23   24