Page 301 - Machine Learning for Subsurface Characterization
P. 301

Classification of sonic wave Chapter  9 263


             by using multipoint compressional wavefront travel-time measurements.
             Hyperparameter optimization is performed using grid search along with five-
             fold cross validation to ensure the generalization of the trained classifiers.
             The classification models will facilitate the categorization of materials contain-
             ing discontinuities of various spatial characteristics solely based on multipoint
             measurements of compressional wavefront arrival times.



             4.1.1 K-nearest neighbors (KNN) classifier
             For a new, unseen, unlabeled sample, KNN selects K neighboring samples from
             the training/testing dataset that are most similar to the new sample. Similarity
             between two points is measured in terms of Minkowski distance between them
             in the feature space. Following that, the new unlabeled sample is assigned a
             class based on the majority class among the K neighboring samples selected
             from the training/testing dataset. KNN does not require an explicit training
             stage. KNN is a nonparametric model because the model does not require to
             learn any parameter (e.g., weight or bias). Nonetheless, to ensure the general-
             izability of KNN model, certain hyperparameters, such as K and distance met-
             ric, need to be adjusted so as to obtain good performance on the testing dataset.
             Increase in K leads to underfitting of the KNN model (high bias). When K is
             low, the KNN model is easily affected by outliers, resulting in model overfitting
             (high variance) (Fig. 9.14).

























             FIG. 9.14 Implementation of KNN classifier with K ¼ 3 on a dataset that has two features and
             three classes. All training/testing samples are represented as circles. Three new, unseen, unlabeled
             samples are represented as stars. KNN algorithm finds three training/testing samples that are closest
             to the new sample and then assigns the majority class of the neighbors as the class for the new
             sample.
   296   297   298   299   300   301   302   303   304   305   306