Page 10 - Machine Learning for Subsurface Characterization
P. 10
xxii Preface
and unexciting tasks, availing us to take on higher-value tasks. ML is a tool for
enhancing human capabilities instead of replacing humans. An important thing
to remember is that all this requires human involvement, assessment, and feed-
back, without which it is difficult to develop reliable, consistent, and robust ML/
DL models.
Challenges and precautions when using machine learning
ML algorithms are only as good as the data that go into them. A trained model
fails on a data that is dissimilar to the training data. ML models are not suitable
for edge/rare cases because the model cannot learn enough statistical informa-
tion about such cases; as a result, the models produce unreliable results with
high uncertainty for such cases. Poor data hygiene leads to “garbage in, garbage
out” scenario. Consequently, a lot of effort is required to transform the messy,
unstructured data into clean structured data suitable for being consumed by the
ML model [8]. ML rely heavily on manual services for creating labels/targets/
annotations and for data cleaning/preprocessing, following that additional ser-
vices are required for structuring the data; all this makes the ML workflow slow,
tedious, and time consuming.
ML systems are yet to figure out ways to accomplish unsupervised learning,
to learn from very limited amount of data under limited computational
resources, or to train without a lot of human intervention. DL workflow requires
a huge amount of information and large computational resources to succeed at
even basic tasks. ML tends to perform poorly in learning new concepts and
extending that learning to new contexts. A major concern is the so-called “curse
of dimensionality,” where having too many features/attributes (high-
dimensional data) and not enough observations/samples (small dataset) hinders
the model development and performance. Also, several ML proponents ignore
the complex challenges faced in the real world when taking a ML model from a
research paper or a controlled study to an engineered product for real-world
deployment [9]. ML practitioners have noticed that these methods generally
pick up patterns and relationships that are inconsistent and do not honor logical
reasoning. Such unexpected relationships and trends learned by these methods
will ultimately invalidate the results during real-world deployment. Moreover,
it has been demonstrated that it is easy to trick ML models to learn inconsistent
patterns or to generate unreliable results. A branch of DL referred as adversarial
attacks is dedicated to developing new ways of fooling deep learning tech-
niques. ML-driven automated systems can be severely affected by such adver-
sarial attacks. ML models are not suitable for edge/rare cases because the model
cannot learn enough statistical information about such cases; as a result, the
models produce unreliable results with high uncertainty for such cases. None
of the recent ML research has shown a lot of progress in these areas.
A lot of the AI/ML hype originates from the extrapolation of current trends
and recent successes. When vendors describe the built-in machine learning