Page 164 - Computational Retinal Image Analysis
P. 164
2 Challenges 159
2.2 Annotation tasks are often unfamiliar to clinicians
Some annotations required to validate image analysis algorithms, especially tracing
the detailed contours of regions of interest, are not part of normal clinical practice,
or not in the form ideally needed to validate retinal image analysis (RIA) algorithms.
This forces specialists to find extra time in their busy schedule, and, in the worst case,
images are annotated without sufficient time and concentration.
2.3 Consistency is hard to achieve
The factors just mentioned, added to the different experience of the annotators, are
likely to introduce inconsistency in the annotations: some annotators may trace con-
tours in great details, others trace approximate ones; some may be more conservative
than others in scoring disease or highlighting lesions; the same specialist may change
his/her decisions while going through an image set. For annotations to be useful they
must be consistent over different annotators (inter-observer variability), and also over
the set created by a single annotator (intra-observer variability). To mitigate the ef-
fects of variations and capture a representative set of opinions one should have access,
ideally, to teams of experts, each annotating the same sets multiple times, to character-
ize statistically both inter- and intra-operator variability. However, one does not want
to impose too strict constraints on annotators, as the inter-observer variability contains
useful information; importantly, the range of variation among expert annotations de-
fines the best performance that an automatic system can achieve meaningfully.
2.4 Collecting annotations may be limited by data governance
Clinical images and data sets, especially large ones, may not be allowed outside their
hospital of origin, be it abroad or within the same country. In some cases, images and
data can be made available only within a certified safe haven (SH), which requires
strict information governance protocols to be adhered to. This forces one to install
and run annotation and processing software within the safe haven, which may intro-
duce restrictions. Similar observations apply to cloud processing for clinical data, a
field which is evolving apace.
2.5 Image quality may vary across images and data sets
This is caused by a plethora of factors, including differences in instruments, op-
erators, patients, and local image acquisition protocols and practices. Differences
in quality may introduce unwanted noise and lead to significantly different results.
2.6 Absence of unambiguous ground truth
Famously, medical image analysis does not have the benefit of highly accurate ground
truth measured objectively. Ground truth consists of gold-standard judgements from
experts (see Section 2.3), the real accuracy of which cannot be established easily, or
not at all. This is not the case in image analysis fields; in metrology, for instance, the
accuracy of an optical instrument measuring lengths within a target accuracy of, say,