Page 166 - Computational Retinal Image Analysis
P. 166
3 Tools and techniques 161
the evaluation of the overall system and limits processing speed. Notice that the
measurements generated by the systems mentioned above do not always agree; this
is being investigated by various groups [13, 16, 17].
3 Tools and techniques
If your only tool is a hammer, make all your problems look like nails.
Variant of the Law of the Instrument
In practice, comparing the results of an algorithms with the ground truth (annota-
tions) available means studying the statistical agreement or disagreement between two
sets of data [5, 18]. Statistics offers well-known instruments (see, e.g., Chapter 10),
but no standard protocol has ever been agreed explicitly by the RIA/MIA international
communities. In this section we review concisely the techniques available and their
rationale, starting from a few considerations on how to choose images sets.
3.1 Choosing images: Aligning data set with clinical criteria
The criteria for choosing test image sets are technical and clinical; awareness of both
is essential to promote effective interdisciplinarity and translation.
3.1.1 Technical criteria
These address how difficult it is for the algorithm being tested to produce the correct
answer for a given image. Hence technical criteria help to choose a set of images
spanning all levels of difficulty. Such levels must reflect what the software applica-
tion is likely to encounter when deployed in the target environment. For example, low
contrast makes detection and segmentation difficult, hence different levels of contrast
should be present in the test set; similarly, a single lesion may appear differently, so
representative variations must be included. While it is impossible to account for all
possible factors and their variations in a test set, an effort should be made to cover at
least the major (most common) ones for the target environment. The choices and ra-
tionale applied for building a data set should be reported in publications using the set.
3.1.2 Clinical criteria
These address how representative the set of patients is for the clinical question to
which the software system refers, or in the clinical context of which the system will
be used. The characterization of the patient cohorts used in a study is a mandatory
and detailed part of clinical papers, typically in a section called “Materials.” To im-
age processing specialists, such a characterization may look unnecessary for testing
novel image processing algorithms; it may seem that a large set of images satisfying
the technical criteria is enough. This may be true to achieve publication in medical
image processing journals, but not for translation and clinical relevance: clinicians
will be interested in testing with images chosen by clinical criteria too, to decide
whether an algorithm can actually work with real patients in the real world.