Page 134 - Computational Retinal Image Analysis
P. 134
5 Deep learning based methods 127
5 Deep learning based methods
Since 2017 a number of methods have been described for retinal cell layer prediction
in OCT using deep learning methods. Two major approaches have been identified
to approach this task. The first treats all locations in the OCT as a prediction task,
where pixels are associated to a retinal layer class directly. This is the canonical seg-
mentation task. The second approach identifies boundaries between layers without
identifying the classes. These methods need an additional step to, on the one hand,
extract the real boundary from a probability map and, on the other hand, to identify
the classes of the separated layers.
The number and types of segmented retinal layers vary significantly between
published work (see Fig. 1 for an overview of possible layers). While the definitions
of the layers remain the same, various works bundle layers together, commonly NFL
to BM as total retinal thickness or RPE and PR.
5.1 Preprocessing and augmentation
To reduce variability in OCT data, some methods apply image preprocessing. A com-
mon preprocessing is flattening of the B-scan, by identifying the Bruch’s membrane
and rolling each A-scan to a predetermined vertical position [20–23].
As is common in medical data, images with an annotated ground truth are scarce.
Authors propose to augment the images to make the trained networks more robust to
variability: rolling using a parabola, rotation around the center, changes in illumina-
tion, vertical and horizontal translation, scale changes, horizontal flip, mild shearing,
additive noise and Gaussian blur. Augmentations are kept realistic in terms of how an
OCT device might deteriorate or change an acquired image (Fig. 5).
5.2 Pixelwise semantic segmentation methods
Pixelwise semantic image segmentation methods assign each pixel/voxel of an OCT
images a layer class. Most proposed deep learning methods make use of a U-Net
[24] or variations of it, with different kinds of convolutions, up- and downsampling,
depth, dropout, residuals and/or batch normalization. In the following we present
recent advancements.
The ReLayNet by Roy et al. [25] modifies the U-Net by replacing the deconvo-
lution decoder-branch with an unpooling layer. It uses the indices from the encoder
max pool layer to upscale the image in these positions, while filling the remaining
gaps with zeros. Instead of training on full B-scan, the authors propose vertical slices
of a constant width (bands), to be able to train in larger mini-batch sizes at the cost
of losing context. A weighted multi-class logistic loss in addition with a smooth dice
loss is used to optimize the network. Layer boundaries receive a higher weight, to
focus training on hard-to-identify tissue transitions/borders. As a preprocessing step,
the RPE is extracted using traditional methods, to vertically align all A-scans (flat-
tening). ReLayNet segments seven retinal layers and fluids. Ben-Cohen et al. [20]