Page 34 - Machine Learning for Subsurface Characterization
P. 34

Unsupervised outlier detection techniques Chapter  1 19


             in Dataset #1. Comparative study on Dataset #2 involved experiments with five
             distinct feature subsets sampled from the available features GR, RHOB, DTC,
             RT, RXO, and NPHI logs. The five feature subsets are referred to as FS1, FS2,
             FS2**, FS3, and FS4, where FS1 contains GR, RHOB, and DTC; FS2 contains
             GR, RHOB, and RT; FS2** contains GR, RHOB, and RXO; FS3 contains GR,
             RHOB, DTC, and RT; and FS4 contains GR, RHOB, DTC, and NPHI. The five
             feature subsets were used to analyze the effects of features on the performances
             of the four unsupervised ODTs. Fig. 1.4B is a 3D scatterplot of Dataset #2 for
             the subset FS1, where blue points (dark gray in the print version) are the known
             inliers and the red points (light gray in the print version) are the known outliers.
             The outlier points in this dataset are mostly concentrated on one end of the plot,
             which indicates that the bad hole resulted in pushing the log responses toward a
             region, similar to collective outliers. The inlier points in Fig. 1.4A and B are the
             same; the outlier points in this dataset as earlier mentioned are based on hole
             size; and unsurprisingly, they are mostly located in the shale formation, which
             are susceptible to washouts and breakouts. Outliers form a cluster at the edge of
             the inlier dataset with some outlier points randomly spread across the plot.


             4.3.3 Dataset #3: Containing shaly layers and bad holes
             with noisy measurements
             Dataset #3 was constructed from the onshore dataset to compare the perfor-
             mances of the four unsupervised outlier detection techniques in detecting thin
             shale layers/beds in the presence of noisy and bad-hole depths. Dataset #3 com-
             prise GR, RHOB, DTC, RT, and NPHI responses from 201 depth points from a
             sandstone bed, 201 depth points from a limestone bed, 201 depth points from a
             dolostone bed, and 101 depth points from a shale bed of the onshore dataset.
             These 704 depths constitute inliers. Thirty bad-hole depths with borehole diam-
                    00
             eter >12 and 40 synthetic noisy log responses are the outliers that are com-
             bined with the 704 inliers to form the Dataset #3. Consequently, Dataset #3
             contains in total 774 samples, out of which 70 are outliers. Comparative study
             on Dataset #3 involved experiments with four distinct feature subsets sampled
             from the available features GR, RHOB, DTC, RT, and NPHI logs, namely, FS1,
             FS2, FS3, and FS4, like that performed on Dataset #2. FS1 contains GR, RHOB,
             and DTC; FS2 contains GR, RHOB, and RT; FS3 contains GR, RHOB, DTC,
             and RT; and FS4 contains GR, RHOB, DTC, and NPHI. Fig. 1.4C is a 3D scat-
             terplot of Dataset #3 for the subset FS1, where blue points (dark gray in the print
             version) are the known inliers and the red points (light gray in the print version)
             are the known outliers. Three separate inlier clusters (dolomite and limestone,
             sandstone, and shale) and two clusters of outliers, noise and bad-hole points, are
             observed in Fig. 1.4C. Noise data are randomly spread around the inlier cluster,
             while the bad-hole data form a cluster close to but distinct from the shale inlier
             cluster.
   29   30   31   32   33   34   35   36   37   38   39