Page 10 - Machine Learning for Subsurface Characterization
P. 10

xxii Preface


            and unexciting tasks, availing us to take on higher-value tasks. ML is a tool for
            enhancing human capabilities instead of replacing humans. An important thing
            to remember is that all this requires human involvement, assessment, and feed-
            back, without which it is difficult to develop reliable, consistent, and robust ML/
            DL models.

            Challenges and precautions when using machine learning

            ML algorithms are only as good as the data that go into them. A trained model
            fails on a data that is dissimilar to the training data. ML models are not suitable
            for edge/rare cases because the model cannot learn enough statistical informa-
            tion about such cases; as a result, the models produce unreliable results with
            high uncertainty for such cases. Poor data hygiene leads to “garbage in, garbage
            out” scenario. Consequently, a lot of effort is required to transform the messy,
            unstructured data into clean structured data suitable for being consumed by the
            ML model [8]. ML rely heavily on manual services for creating labels/targets/
            annotations and for data cleaning/preprocessing, following that additional ser-
            vices are required for structuring the data; all this makes the ML workflow slow,
            tedious, and time consuming.
               ML systems are yet to figure out ways to accomplish unsupervised learning,
            to learn from very limited amount of data under limited computational
            resources, or to train without a lot of human intervention. DL workflow requires
            a huge amount of information and large computational resources to succeed at
            even basic tasks. ML tends to perform poorly in learning new concepts and
            extending that learning to new contexts. A major concern is the so-called “curse
            of dimensionality,” where having too many features/attributes (high-
            dimensional data) and not enough observations/samples (small dataset) hinders
            the model development and performance. Also, several ML proponents ignore
            the complex challenges faced in the real world when taking a ML model from a
            research paper or a controlled study to an engineered product for real-world
            deployment [9]. ML practitioners have noticed that these methods generally
            pick up patterns and relationships that are inconsistent and do not honor logical
            reasoning. Such unexpected relationships and trends learned by these methods
            will ultimately invalidate the results during real-world deployment. Moreover,
            it has been demonstrated that it is easy to trick ML models to learn inconsistent
            patterns or to generate unreliable results. A branch of DL referred as adversarial
            attacks is dedicated to developing new ways of fooling deep learning tech-
            niques. ML-driven automated systems can be severely affected by such adver-
            sarial attacks. ML models are not suitable for edge/rare cases because the model
            cannot learn enough statistical information about such cases; as a result, the
            models produce unreliable results with high uncertainty for such cases. None
            of the recent ML research has shown a lot of progress in these areas.
               A lot of the AI/ML hype originates from the extrapolation of current trends
            and recent successes. When vendors describe the built-in machine learning
   5   6   7   8   9   10   11   12   13   14   15