Page 33 - Machine Learning for Subsurface Characterization

P. 33

18 Machine learning for subsurface characterization

FIG. 1.4 3D scatterplots of (A) Dataset #1, (B) Dataset #2, and (C) Dataset #3.

three distinct feature subsets sampled from the available features, namely GR,
RHOB, DTC, and RT logs. The three feature subsets are referred to as FS1, FS2,
and FS2*, where FS1 contains GR, RHOB, and DTC; FS2 contains GR, RHOB,
and RT; and FS2* contains GR, RHOB, and RT. Fig. 1.4A is a 3D scatterplot of
Dataset #1 for the subset FS1, such that the blue points (dark gray in the print
version) are the labeled known inliers and the red points (light gray in the print
version) are the labeled known outliers. The outliers are spread evenly around
the dataset similar to point outliers. Inlier samples form a cluster with some
sparse points spread around the cluster; the outliers are spread randomly and
evenly around the inlier cluster (Fig. 1.4A).

4.3.2 Dataset #2: Containing bad holes
Dataset #2 was constructed from the onshore dataset to compare the perfor-
mances of the four unsupervised outlier detection techniques in detecting depths
where the log responses are adversely affected by the large borehole sizes, also
referred as bad holes. Dataset #2 comprise GR, RHOB, DTC, deep resistivity
(RT), shallow resistivity (RXO), and neutron porosity (NPHI) logs from the
00
onshore dataset for depths, where the borehole diameter is between 7.8 and
00
8.2 . Following that, the depths in the onshore dataset where borehole diameter
00
is greater than12 were added to Dataset #2 as point and collective outliers.
Consequently, Dataset #2 contains in total 4128 samples, out of which 91
are outliers and 4037 are inliers. Inliers in Dataset #2 are the same as those

28 29 30 31 32 33 34 35 36 37 38