Page 30 - Machine Learning for Subsurface Characterization
P. 30

Unsupervised outlier detection techniques Chapter  1 15


             the steps laid out in Fig 1.3A. To ensure a controlled environment for our inves-
             tigation, we created four distinct validation datasets comprising outlier/inlier
             labels, which were assigned by a human expert. The ability of the unsupervised
             ODTs to accurately detect the outliers and inliers is analyzed using various eval-
             uation metrics. It is to be noted that real-world implementations of unsupervised
             ODTs are generally done without any prior knowledge of outliers and inliers by
             following the steps laid out in Fig. 1.3B; consequently, there is no way to eval-
             uate the unsupervised ODTs during real-world implementations and choose the
             best one. Nonetheless, our comparative study will help identify the unsuper-
             vised ODT that performs the best on various types of well-log dataset with min-
             imal hyperparameter tuning.

             4.1 Description of the dataset used for the comparative study
             of unsupervised ODTs
             Log data used for this work were obtained from two wells in different reservoirs.
             Gamma ray (GR), density (RHOB), neutron porosity (NPHI), compressional
             sonic travel time (DTC), and deep and shallow resistivity logs (RT and
             RXO) from Well 1 are available within the depth interval of 580–5186 ft com-
             prising 5617 depth samples; herein, this dataset will be referred as the onshore
             dataset. The onshore dataset contains log responses from different lithologies of
             limestone, sandstone, dolomite, and shale. Gamma ray (GR), density (DEN),
             neutron porosity (NEU), compressional sonic transit time (AC), deep and
             medium resistivities (RDEP and RMED), and photoelectric factor (PEF) logs
             from Well 2 are available within the depth interval of 8333–13327 ft compris-
             ing 9986 depth samples; herein, this dataset will be referred as the offshore data-
             set. The offshore dataset contains log responses from different lithologies of
             limestones, sandstone, dolomite, shale, and anhydrites.


             4.2 Data preprocessing
             Data preprocessing refers to the transformations applied to data before feeding
             them to the machine learning algorithm [15]. Primary use of data preprocessing
             is to convert the raw data into a clean dataset that the machine learning work-
             flow can process. A few data preprocessing tasks include fixing null/NaN
             values, imputing missing values, scaling the features, normalizing samples,
             removing anomalies, encoding the qualitative/nominal categorical features,
             and data reformatting. Data preprocessing is an important step because a
             data-driven model built using machine learning is as good as the quality of data
             processed by the model.

             4.2.1 Feature transformation: Convert R to log(R)
             Machine learning models tend to be more efficient when the features/attributes
             are not skewed and have relatively similar distribution and variance. Resistivity
   25   26   27   28   29   30   31   32   33   34   35