Page 19 - Machine Learning for Subsurface Characterization
P. 19
4 Machine learning for subsurface characterization
driven model is used to predict targets/outcomes based on the features/attributes
of new, unseen samples during the model deployment. In unsupervised learning
(e.g., clustering and transformation), a data-driven model learns to generate an
outcome based on the features/attributes of samples without any prior informa-
tion about the outcomes. In reinforcement learning (which tends to be very chal-
lenging), a data-driven model learns to perform a specific task by interacting
with an environment to receive a reward based on the actions performed by
the model toward accomplishing the task. In reinforcement learning, the model
learns the policy for a specific task by optimizing the cumulative reward
obtained from the environment. These three learning techniques have several
day-to-day applications; for instance, supervised learning is commonly used
in spam detection. The spam detection model is trained on different mails
labeled as spam or not spam; after gaining the knowledge from the training data-
set and subsequent evaluation on the testing dataset, the trained spam detection
model can detect if a new mail is spam or not. Unsupervised learning is used in
marketing where customers are categorized/segmented based on the similarity/
dissimilarity of their purchasing trends as compared with other customers;
for instance, Netflix’s computational engine uses the similarity/dissimilarity
between what other users have watched when recommending the movies. Rein-
forcement learning was used to train DeepMind’s AlphaGo to beat world cham-
pions at the game of Go. Reinforcement learning was also used to train the chess
playing engine, where the model was penalized for making moves that led to
losing a piece and rewarded for moves that led to a checkmate.
A machine learning method first processes the training dataset to build a
data-driven model; following that, the performance of the newly developed
model is evaluated against the testing dataset. After confirming the accuracy
and precision of the data-driven model on the testing dataset, these methods
are deployed on the new dataset. These three types of dataset, namely, training,
testing, and new dataset, comprise measurements of certain specific features for
numerous samples. The training and testing datasets, when used in supervised
learning, contain additional measurements of the targets/outcomes. A super-
vised learning technique tries to functionally relate the features to the targets
for all the samples in the dataset. On the contrary, for unsupervised learning,
the data-driven model development takes place without the targets; in other
words, there are no targets to be considered during the training and testing
stages of unsupervised learning. Obviously, information about the targets is
never available in the new dataset because the trained models are deployed
on the new dataset to compute the desired targets or certain outcomes.
1.3 Types of outliers
In the context of this work, outliers can be broadly categorized into three types:
point/global, contextual, and collective outliers [1]. Point/global outliers refer to
individual data points or samples that significantly deviate from the overall dis-
tribution of the entire dataset or from the distribution of certain combination of