Page 286 - Computational Retinal Image Analysis

P. 286

284 CHAPTER 14 OCT fluid detection and quantification

Kermany et al. [44] used a transfer learning approach with a pretrained
Inception V3 CNN architecture [55] serving as a fixed feature extractor. They
achieved a high performance (≈98%) in classifying a B-scan into an nAMD,
DME, early AMD, or normal retina and an almost perfect performance with an
AUC = 0.999 for identifying urgent referrals (nAMD and DME). Only the fi-
nal softmax layer was trained on 100,000 B-scans from 4686 patients and the
model was tested with 1000 B-scans (250 from each category) from 633 patients.
A similar transfer learning approach [56] successfully detected nAMD from a
central OCT B-scan and was trained with 1012 B-scans. An accuracy of 0.98
was achieved on a test set of 100 B-scans equally balanced between nAMD and
healthy examples. OCT-level and patient-level performances were not reported
in the previous two works.
Lee et al. [45] proposed a CNN trained from scratch on more than 100,000
B-scans to distinguish nAMD B-scans from normal scans. The model relied on
VGG16 [57] network architecture. A total of 80,839 B-scans (41,074 from AMD
and 39,765 from normal eyes) were used for training and 20,163 B-scans (11,616
from AMD and 8547 from normal eyes) were used for validation. At a B-scan
level, they achieved an AUC of 92.78% with an accuracy of 87.63%. At the macu-
lar OCT-scan level, they achieved an AUC of 93.83% with an accuracy of 88.98%
[58] trained a similar deep learning architecture GoogLeNet (Inception-v1) [59],
from scratch, however, with the goal of automatically determining the need for
anti-VEGF retreatment rather than purely the presence of fluid. After training on
153,912 B-scans, the prediction accuracy was 95.5% with an AUC of 96.8% on a
test set of 5358 B-scans. At an OCT-scan level, an AUC of 98.8% and an accuracy
of 94% were reported.
With the end-to-end image classification pipeline, there is an additional need
to interpret the resulting decision. Typically, an occlusion test [60] is performed,
where a blank box is systematically moved across the image and the change in the
output probabilities recorded. The highest drop in the probability is assumed to
correspond to the region of interest with the highest importance that contributes
most to the neural network’s decision on the predicted diagnosis. When classify-
ing an exudative disease, the highlighted areas should correspond to the fluid.
Using such interpretability strategies, a coarse fluid segmentation can be achieved
as a by-product of the image classification model. An example is shown in Fig. 5.

3.2.1 Traditional machine-learning approaches
General-purpose image descriptors were the state of the art for image recognition be-
fore the advent of deep learning. Liu et al. [61] used a local binary pattern (LBP) with
PCA dimensionality reduction to obtain histograms, which are capable of encod-
ing texture and shape information in retinal OCT images and their edge maps. The
optimized model used a multiclass classifier in the form of multiple one-vs-all bi-
nary SVMs with four considered classes, macular edema, normal, macular hole, and
AMD, trained on a dataset of 326 OCT central B-scans from 136 eyes. Srinivasan
et al. [62] used the method based on describing the B-scan content with multiscale

281 282 283 284 285 286 287 288 289 290 291