Page 165 - Computational Retinal Image Analysis

P. 165

160 CHAPTER 9 Validation

±10 μm can be tested by comparisons with interferometric measurements accurate to
a much higher level (nanometers). In MIA, expert annotations are always different,
depending, among others, on annotator’s experience and background, image quality,
completeness and possible interpretations of the annotation protocol [2, 4, 5].

2.7 Time-varying quantities are not well represented by a single
measurement

Medical objects of interest naturally vary in time, adding further complexity to vali-
dation. For instance, changes in the width of the larger arterioles in the retina within
the heart cycle can be observed near the OD in small-field-of-view fundus cam-
era videos [10]. Hence, accurate arteriolar width would really be best described by
a range of values; a single measurement taken at a random time is just a random
sample from that range. Incidentally, estimating the effect of this approximation on
calculations and statistical analysis using such approximate measurements is another
open (and currently underresearched) challenge.

2.8 Test criteria and data sets are not uniform in the literature
RIA and MIA algorithms addressing the same problem (e.g., in retinal image analy-
sis, optic disc contour detection, or drusen localization, or blood vessel segmentation
and classification) are often tested with different data sets and criteria (e.g., sensitiv-
ity, specificity, accuracy, area under the curve, others) or combinations thereof. This
makes objective comparisons very difficult. A revealing (and somewhat concerning)
analysis of the effect of changing performance criteria on the ranking of algorithms
in international challenges is given in Maier-Hein et al. [4].

2.9 Dependency on application/task

As noted in Trucco et al. [5], annotations of the same quantity or of an image can
vary depending on the task a clinician has in mind. For instance, the width of retinal
vessels in fundus images may be systematically overestimated by doctors thinking
surgically, hence trying to keep at a distance from the vessel. In such cases, it seems
advisable to group reference annotations also by clinical task and to specify annota-
tion protocols accordingly.

2.10 Human in the loop

Some RIA systems are semi-automatic, i.e., require intervention of a trained operator
to generate results. Well-known examples are SIVA [11], VAMPIRE (including fun-
dus and SLO tools) [12–14], and IVAN [15], all used internationally by many clinical
groups. Such systems generate rich quantitative characterizations of the morphom-
etry of the retinal vasculature in fundus images, computing measurements defined
a priori. Manual intervention by trained operators limits errors and inaccuracies,
hence rejection rates and wrong data passed to statistical analysis, but complicates

160 161 162 163 164 165 166 167 168 169 170