Page 171 - Computational Retinal Image Analysis
P. 171

166    CHAPTER 9  Validation




                         3.3  Validation on outcome: Focus on the clinical task
                         Validation on outcome aims to validate an algorithm or software tool within the sys-
                         tem of which it is a part. Consider for instance an algorithm detecting microaneu-
                         rysms in retinal fundus images, meant as a component of an automatic refer/do not
                         refer system [33]. Direct validation requires annotations of individual microaneu-
                         rysms. Validation on outcome requires only the referral decision. At a parity of other
                         conditions during testing, the microaneurysms detection module is validated suc-
                         cessfully when automatic referral decisions achieve the accuracy desired.
                            An important advantage of this approach is that it avoids creating additional tasks
                         for doctors providing annotations. In our example, referral decisions are generated in
                         normal practice, but detailed annotations of lesions on images are not. A challenge is
                         that deciding what constitutes the “outcome” may not always be obvious [5].


                         4  Annotations and data, annotations as data

                            In God we trust; others must provide data
                                                                              Edwin R. Fisher
                            Further, important elements involved in validation emerge if we stand back from
                         the discussion so far, and attempt to look at validation in all its aspects. We discuss
                         concisely a few in this section.


                         4.1  Annotation protocols and their importance
                         The collection of ground truth to validate RIA and MIA systems requires the de-
                         velopment of a protocol for annotating images or videos, in itself a complex task.
                         Various tasks are involved; we summarize the main ones below.
                         •  Protocol design. The protocol must be designed jointly by the clinical and
                            technical (MIA) team. Multiple clinicians ought to be involved [2, 3, 34].
                            A pilot study can, in our experience, help to identify key parameters of the
                            protocol: for instance, if an ordinal grading scale is involved (e.g., scoring
                            tortuosity, or the severity of a lesion), the optimal number of levels may be
                            identified not only on the basis of current clinical practice, but also of pilot
                            experiments suggesting the number yielding the most accurate results with
                            an automatic system. Hence the final number is obtained by discussion as a
                            compromise between the original one (clinical practice) and the result of the
                            pilot study.
                         •  Ground truth type. Once a protocol is agreed, the designers may simply decide
                            to output a set of measurements for each annotator, or also define summative
                            ones capturing some form of consensus among annotators to reconcile
                            differences in measurements. Note that we use “consensus” in a general sense:
                            generating a single value from a set of differing ones (e.g., the tortuosity level of
                            an artery given the different estimates of, say, three annotators).
   166   167   168   169   170   171   172   173   174   175   176