Page 165 - Computational Retinal Image Analysis
P. 165

160    CHAPTER 9  Validation




                         ±10 μm can be tested by comparisons with interferometric measurements accurate to
                         a much higher level (nanometers). In MIA, expert annotations are always different,
                         depending, among others, on annotator’s experience and background, image quality,
                         completeness and possible interpretations of the annotation protocol [2, 4, 5].


                         2.7  Time-varying quantities are not well represented by a single
                         measurement

                         Medical objects of interest naturally vary in time, adding further complexity to vali-
                         dation. For instance, changes in the width of the larger arterioles in the retina within
                         the heart cycle can be observed near the OD in small-field-of-view fundus cam-
                         era videos [10]. Hence, accurate arteriolar width would really be best described by
                         a range of values; a single measurement taken at a random time is just a random
                         sample from that range. Incidentally, estimating the effect of this approximation on
                         calculations and statistical analysis using such approximate measurements is another
                         open (and currently underresearched) challenge.


                         2.8  Test criteria and data sets are not uniform in the literature
                         RIA and MIA algorithms addressing the same problem (e.g., in retinal image analy-
                         sis, optic disc contour detection, or drusen localization, or blood vessel segmentation
                         and classification) are often tested with different data sets and criteria (e.g., sensitiv-
                         ity, specificity, accuracy, area under the curve, others) or combinations thereof. This
                         makes objective comparisons very difficult. A revealing (and somewhat concerning)
                         analysis of the effect of changing performance criteria on the ranking of algorithms
                         in international challenges is given in Maier-Hein et al. [4].

                         2.9  Dependency on application/task

                         As noted in Trucco et al. [5], annotations of the same quantity or of an image can
                         vary depending on the task a clinician has in mind. For instance, the width of retinal
                         vessels in fundus images may be systematically overestimated by doctors thinking
                         surgically, hence trying to keep at a distance from the vessel. In such cases, it seems
                         advisable to group reference annotations also by clinical task and to specify annota-
                         tion protocols accordingly.

                         2.10  Human in the loop

                         Some RIA systems are semi-automatic, i.e., require intervention of a trained operator
                         to generate results. Well-known examples are SIVA [11], VAMPIRE (including fun-
                         dus and SLO tools) [12–14], and IVAN [15], all used internationally by many clinical
                         groups. Such systems generate rich quantitative characterizations of the morphom-
                         etry of the retinal vasculature in fundus images, computing measurements defined
                         a priori. Manual intervention by trained operators limits errors and inaccuracies,
                         hence rejection rates and wrong data passed to statistical analysis, but  complicates
   160   161   162   163   164   165   166   167   168   169   170