Page 34 - Machine Learning for Subsurface Characterization
P. 34
Unsupervised outlier detection techniques Chapter 1 19
in Dataset #1. Comparative study on Dataset #2 involved experiments with five
distinct feature subsets sampled from the available features GR, RHOB, DTC,
RT, RXO, and NPHI logs. The five feature subsets are referred to as FS1, FS2,
FS2**, FS3, and FS4, where FS1 contains GR, RHOB, and DTC; FS2 contains
GR, RHOB, and RT; FS2** contains GR, RHOB, and RXO; FS3 contains GR,
RHOB, DTC, and RT; and FS4 contains GR, RHOB, DTC, and NPHI. The five
feature subsets were used to analyze the effects of features on the performances
of the four unsupervised ODTs. Fig. 1.4B is a 3D scatterplot of Dataset #2 for
the subset FS1, where blue points (dark gray in the print version) are the known
inliers and the red points (light gray in the print version) are the known outliers.
The outlier points in this dataset are mostly concentrated on one end of the plot,
which indicates that the bad hole resulted in pushing the log responses toward a
region, similar to collective outliers. The inlier points in Fig. 1.4A and B are the
same; the outlier points in this dataset as earlier mentioned are based on hole
size; and unsurprisingly, they are mostly located in the shale formation, which
are susceptible to washouts and breakouts. Outliers form a cluster at the edge of
the inlier dataset with some outlier points randomly spread across the plot.
4.3.3 Dataset #3: Containing shaly layers and bad holes
with noisy measurements
Dataset #3 was constructed from the onshore dataset to compare the perfor-
mances of the four unsupervised outlier detection techniques in detecting thin
shale layers/beds in the presence of noisy and bad-hole depths. Dataset #3 com-
prise GR, RHOB, DTC, RT, and NPHI responses from 201 depth points from a
sandstone bed, 201 depth points from a limestone bed, 201 depth points from a
dolostone bed, and 101 depth points from a shale bed of the onshore dataset.
These 704 depths constitute inliers. Thirty bad-hole depths with borehole diam-
00
eter >12 and 40 synthetic noisy log responses are the outliers that are com-
bined with the 704 inliers to form the Dataset #3. Consequently, Dataset #3
contains in total 774 samples, out of which 70 are outliers. Comparative study
on Dataset #3 involved experiments with four distinct feature subsets sampled
from the available features GR, RHOB, DTC, RT, and NPHI logs, namely, FS1,
FS2, FS3, and FS4, like that performed on Dataset #2. FS1 contains GR, RHOB,
and DTC; FS2 contains GR, RHOB, and RT; FS3 contains GR, RHOB, DTC,
and RT; and FS4 contains GR, RHOB, DTC, and NPHI. Fig. 1.4C is a 3D scat-
terplot of Dataset #3 for the subset FS1, where blue points (dark gray in the print
version) are the known inliers and the red points (light gray in the print version)
are the known outliers. Three separate inlier clusters (dolomite and limestone,
sandstone, and shale) and two clusters of outliers, noise and bad-hole points, are
observed in Fig. 1.4C. Noise data are randomly spread around the inlier cluster,
while the bad-hole data form a cluster close to but distinct from the shale inlier
cluster.