Page 163 - Computational Retinal Image Analysis
P. 163

158    CHAPTER 9  Validation




                            Mission accomplished then? Hardly. Expanding the definition of “validation”
                         above even just a little more [6], we realize that, for a well-specified image analysis
                         problem (see above), one must:
                           (i)  procure clinically relevant, well-characterized data sets (of sufficient size);
                           (ii)  procure a sufficient quantity of annotations from well-characterized experts;
                          (iii)  compute automatic measurements;
                          (iv)  compare statistically the annotations from experts with the automatic
                              results.
                            Even this limited expansion exposes some nontrivial questions. For instance,
                         when exactly is a data set clinically relevant? What do we need to know to declare
                         experts and data sets well characterized? How do we reconcile different annotations
                         for the same images? What should we do, if anything, when different annotators
                         disagree (the usual situation) before we compare their annotations with automatic
                         results? What does sufficient quantity mean in practice?
                            If we think a little broader, further challenges appear. For example, the answers to
                         the questions above change if considering a proof-of-concept validation for a novel
                         algorithm, say suitable for publication in research journals, or the actual translation
                         of the technology into healthcare: the latter requires, among others, much larger,
                         more carefully selected (from a clinical point of view) patient cohorts, replications in
                         multiple, independent cohorts, and conformity with rules from regulatory bodies like
                         the FDA in the United States or the EMA in Europe.
                            This chapter opens with a concise discussion of the issues making validation a
                         serious challenge (Section 2). It then reviews tools and techniques that we regard as
                         good practice, including data selection and evaluation criteria (Section 3). Further
                         discussion is devoted to the important point of the design of annotation protocols for
                         annotators (Section 4). The chapter closes with a summary and ideas for spreading
                         good practice internationally (Section 5).



                         2  Challenges

                            The gross national product measures everything, except what makes life
                            worthwhile.
                                                                              Robert Kennedy

                         2.1  Annotations are expensive
                         Validating and training contemporary computational systems like deep learning sys-
                         tems requires larger and larger volumes of annotations [7–9], but annotating images
                         is time consuming, hence expensive in terms of time and money. The time of clinical
                         practitioners is normally at a premium. The cost of an annotation task depends on
                         what and how much must be annotated: for example, assigning a grade to a fundus
                         image for diabetic retinopathy is quicker than tracing blood vessels with a software
                         tool in the same image.
   158   159   160   161   162   163   164   165   166   167   168