Page 166 - Computational Retinal Image Analysis
P. 166

3  Tools and techniques  161




                  the  evaluation of the overall system and limits processing speed. Notice that the
                  measurements generated by the systems mentioned above do not always agree; this
                  is being investigated by various groups [13, 16, 17].



                  3  Tools and techniques

                     If your only tool is a hammer, make all your problems look like nails.
                                                        Variant of the Law of the Instrument
                     In practice, comparing the results of an algorithms with the ground truth (annota-
                  tions) available means studying the statistical agreement or disagreement between two
                  sets of data [5, 18]. Statistics offers well-known instruments (see, e.g., Chapter 10),
                  but no standard protocol has ever been agreed explicitly by the RIA/MIA international
                  communities. In this section we review concisely the techniques available and their
                  rationale, starting from a few considerations on how to choose images sets.

                  3.1  Choosing images: Aligning data set with clinical criteria

                  The criteria for choosing test image sets are technical and clinical; awareness of both
                  is essential to promote effective interdisciplinarity and translation.

                  3.1.1   Technical criteria
                  These address how difficult it is for the algorithm being tested to produce the correct
                  answer for a given image. Hence technical criteria help to choose a set of images
                  spanning all levels of difficulty. Such levels must reflect what the software applica-
                  tion is likely to encounter when deployed in the target environment. For example, low
                  contrast makes detection and segmentation difficult, hence different levels of contrast
                  should be present in the test set; similarly, a single lesion may appear differently, so
                  representative variations must be included. While it is impossible to account for all
                  possible factors and their variations in a test set, an effort should be made to cover at
                  least the major (most common) ones for the target environment. The choices and ra-
                  tionale applied for building a data set should be reported in publications using the set.
                  3.1.2   Clinical criteria
                  These address how representative the set of patients is for the clinical question to
                  which the software system refers, or in the clinical context of which the system will
                  be used. The characterization of the patient cohorts used in a study is a mandatory
                  and detailed part of clinical papers, typically in a section called “Materials.” To im-
                  age processing specialists, such a characterization may look unnecessary for testing
                  novel image processing algorithms; it may seem that a large set of images satisfying
                  the technical criteria is enough. This may be true to achieve publication in medical
                  image processing journals, but not for translation and clinical relevance: clinicians
                  will be interested in testing with images chosen by clinical criteria too, to decide
                  whether an algorithm can actually work with real patients in the real world.
   161   162   163   164   165   166   167   168   169   170   171