Page 283 - Computational Retinal Image Analysis
P. 283
3 Oct fluid detection 281
overlap related to intersection over union between two sets X and Y, corresponding
to the segmented pixels and the ground truth.
| X ∩ | 2TP Precision × Recall
Y
DSC = 2 = = 2 . (1)
+
|
| X + | Y | 2TP + FN FP Precision + RRecall
A downside of DSC is its high sensitivity to errors when only a small amount of
fluid is present in the image. Thus, additional metrics with direct clinical interpreta-
tion are often considered, for example, fluid volume. Xu et al. [34] proposed a number
of such clinically oriented measures of segmentation performance. They computed
a Bland-Altman plot to measure the method’s bias and the limits of agreement with
the gold standard volumes. In addition to comparing volumes, they displayed scatter
plots comparing properties between automated and gold standard segmentations of
fluid pockets such as the average distance to BM, average distance to fovea, and their
number stratified by the fluid volume quantity.
Several datasets were prepared to allow evaluation and comparison between dif-
ferent automated algorithms. A publicly available dataset was released by Duke con-
taining 110 B-scans acquired with Spectralis OCT from 10 patients with DME and
annotated by two experts [24]. As part of the OPTIMA Cyst Segmentation Challenge
[51] organized during the MICCAI conference 2015, four teams compared their
machine-learning methods in segmenting IRF using fully and double annotated
OCT volumes with a training set of 15 OCT volumes and a test set of 15 OCT vol-
umes from OCTs acquired with four different vendors: Topcon, Cirrus, Nidek, and
Spectralis. In the RETOUCH challenge [52], a satellite event of the MICCAI 2017,
eight teams compared their deep learning methods for IRF, SRF, and PED segmenta-
tion using 70 OCT fully annotated scans for training and 42 double annotated OCT
for testing, with the scans encompassing two retinal diseases (i.e., nAMD and RVO)
and three OCT vendors (i.e., Cirrus, Spectralis, and Topcon).
The datasets that have been annotated by multiple observers offer an insight into
the interobserver variability and the ceiling in evaluating segmentation performance
2
trained from manual annotations. Xu et al. [34] measured a high agreement of R =
0.98 between extracted fluid volumes between two experts. Using a more stringent
DSC metric, Lee et al. [22] reported a mean DSC for human interrater reliability of
0.75. In RETOUCH, an intercenter DSC = 0.73 (±0.17) was reported on a large test
dataset. This illustrates the difficulty of performing even the manual annotations of
fluid on OCT.
3 OCT fluid detection
Management of exudative macular diseases is largely based on the ability to detect
retinal fluid on OCT for the purpose of initial diagnosis and retreatment indications.
Thus, detection of fluid on OCT has become a routine clinical task but reliable evalua-
tion of leakage activity is difficult and subjective when performed manually, resulting
in the risk of missing the fluid and subjectivity influencing the retreatment procedure.