Page 21 - Machine Learning for Subsurface Characterization
P. 21

6   Machine learning for subsurface characterization


            fluids; consequently, these measurements generally do not necessarily exhibit
            Gaussian distribution and generally exhibit considerable correlations within
            the features. Data-driven outlier detection techniques built using machine
            learning are more robust in detecting outliers as compared with simple
            statistical tools.
               Outliers in dataset can be detected using either supervised or unsupervised
            ML technique. In supervised ODT, outlier detection is treated as a classifica-
            tion problem. The outlier-detection model is trained on dataset with samples
            prelabeled as either normal data (inliers) or outliers. The trained model then
            assignslabelstothesamplesina new, unseen, unlabeled dataset as either
            inliers or outliers basedonwhatwas learnedfromthe training dataset. Super-
            vised ODT is robust when the model is exposed to a large, statistically diverse
            training set (i.e., dataset that contains every possible instance of normal/inlier
            and outlier samples), whose samples are accurately labeled as normal/inlier or
            outlier. Unfortunately, this is difficult, time-consuming, and sometimes
            impossible to obtain because it requires significant human expertise in label-
            ing and expensive data acquisition to obtain a large dataset. In contrary, unsu-
            pervised ODT overcomes the requirement of labeled dataset. Unsupervised
            ODTs generally assume the following: (1) The number of outliers is much
            smaller than the normal samples, and (2) outliers do not follow the overall
            “trend” in the dataset. A list of popular outlier detection techniques is listed
            in Appendix A.
               Both supervised and unsupervised ODTs are used in various industries. For
            instance, in credit fraud detection, neural networks are trained on all known
            fraudulent and legitimate transactions, and every new transaction is assigned
            a fraudulent or legitimate label by the model based on the information from
            the training dataset. It could also be trained in an unsupervised manner by flag-
            ging transactions that are dissimilar from what is normally encountered. In med-
            ical diagnosis, ODTs are used in early detection and diagnosis of certain
            diseases by analyzing the patient data (e.g., blood pressure, heart rate, and insu-
            lin level) to find patients for whom the measurements deviate significantly from
            the normal conditions. Zengyou et al. [2] used a cluster-based local outlier fac-
            tor algorithm to detect malignant breast cancer by training their model on fea-
            tures related to breast cancer. ODTs are also used in detecting irregularities in
            the heart functioning by analyzing the measurements from an echocardiogram
            (ECG) for purposes of early diagnosis of certain heart diseases. In the oil and
            gas industry, Chaudhary et al. [3] was able to improve the performance of the
            stretched exponential production decline (SEPD) model by detecting and
            removing outliers from production data by using the local outlier factor method.
            In another oil and gas application, Luis et al. [4] used one-class support vector
            machine (OCSVM) to detect possible operational issues in offshore turboma-
            chinery, such as pumps and compressors, by detecting anomalous signals from
            their sensors. When implementing an unsupervised ODT, a prior knowledge of
            the expected fraction of outliers improves the accuracy of outlier detection. In
   16   17   18   19   20   21   22   23   24   25   26