Page 286 - Computational Retinal Image Analysis
P. 286

284    CHAPTER 14  OCT fluid detection and quantification




                            Kermany et  al. [44] used a transfer learning approach with a pretrained
                         Inception V3 CNN architecture [55] serving as a fixed feature extractor. They
                         achieved a high performance (≈98%) in classifying a B-scan into an nAMD,
                         DME, early AMD, or normal retina and an almost perfect performance with an
                         AUC = 0.999 for identifying urgent referrals (nAMD and DME). Only the fi-
                         nal softmax layer was trained on 100,000 B-scans from 4686 patients and the
                         model was tested with 1000 B-scans (250 from each category) from 633 patients.
                         A similar transfer learning approach [56] successfully detected nAMD from a
                         central OCT B-scan and was trained with 1012 B-scans. An accuracy of 0.98
                         was achieved on a test set of 100 B-scans equally balanced between nAMD and
                         healthy examples. OCT-level and patient-level performances were not reported
                         in the previous two works.
                            Lee et al. [45] proposed a CNN trained from scratch on more than 100,000
                         B-scans to distinguish nAMD B-scans from normal scans. The model relied on
                         VGG16 [57] network architecture. A total of 80,839 B-scans (41,074 from AMD
                         and 39,765 from normal eyes) were used for training and 20,163 B-scans (11,616
                         from AMD and 8547 from normal eyes) were used for validation. At a B-scan
                         level, they achieved an AUC of 92.78% with an accuracy of 87.63%. At the macu-
                         lar OCT-scan level, they achieved an AUC of 93.83% with an accuracy of 88.98%
                         [58] trained a similar deep learning architecture GoogLeNet (Inception-v1) [59],
                         from scratch, however, with the goal of automatically determining the need for
                         anti-VEGF retreatment rather than purely the presence of fluid. After training on
                         153,912 B-scans, the prediction accuracy was 95.5% with an AUC of 96.8% on a
                         test set of 5358 B-scans. At an OCT-scan level, an AUC of 98.8% and an accuracy
                         of 94% were reported.
                            With the end-to-end image classification pipeline, there is an additional need
                         to interpret the resulting decision. Typically, an occlusion test [60] is performed,
                         where a blank box is systematically moved across the image and the change in the
                         output probabilities recorded. The highest drop in the probability is assumed to
                         correspond to the region of interest with the highest importance that contributes
                         most to the neural network’s decision on the predicted diagnosis. When classify-
                         ing an exudative disease, the highlighted areas should correspond to the fluid.
                         Using such interpretability strategies, a coarse fluid segmentation can be achieved
                         as a by-product of the image classification model. An example is shown in Fig. 5.

                         3.2.1   Traditional machine-learning approaches
                         General-purpose image descriptors were the state of the art for image recognition be-
                         fore the advent of deep learning. Liu et al. [61] used a local binary pattern (LBP) with
                         PCA dimensionality reduction to obtain histograms, which are capable of encod-
                         ing texture and shape information in retinal OCT images and their edge maps. The
                         optimized model used a multiclass classifier in the form of multiple one-vs-all bi-
                         nary SVMs with four considered classes, macular edema, normal, macular hole, and
                         AMD, trained on a dataset of 326 OCT central B-scans from 136 eyes. Srinivasan
                         et al. [62] used the method based on describing the B-scan content with multiscale
   281   282   283   284   285   286   287   288   289   290   291