Page 167 - Computational Retinal Image Analysis
P. 167
162 CHAPTER 9 Validation
In an effective interdisciplinary collaboration on RIA/MIA, it is important for com-
puter scientists to realize that clinicians will find it difficult to see the value of an appli-
cation tested only on technical criteria. It is similarly important for clinicians to realize
that computer scientists will aim to achieve high performance first (Section 3.2), but are
likely to have to learn how to create a clinically interesting data set.
3.2 Direct techniques: Focus on the image processing task
We review briefly the performance assessment techniques that we consider an essen-
tial toolkit for RIA/MIA. We do not aim to provide a complete tutorial, only to list
the techniques that we regard as essential.
The purpose of direct techniques is to compare quantitatively the output of a pro-
gram (e.g., contours of regions, labels for images or parts thereof) with annotations
given in the same format as the program output. Notice that the latter excludes, at this
stage, validation on outcome (Section 3.3).
3.2.1 Receiver operating characteristic (ROC) curves
A receiver operating characteristic curve, or ROC curve [19], is a plot that demon-
strates the performance of a test to discriminate between two classes compared to a
gold standard (e.g., a computer generated segmentation vs a hand-drawn segmenta-
tion by an expert human grader) or cases (e.g., separating disease cases from normal
ones). It is created by plotting the true positive rate (TPR), or Sensitivity, against
the false positive rate (FPR), i.e., 1-Specificity, for different threshold settings of
a parameter. For every possible parameter value selected to discriminate between
two classes or cases, some data will be correctly classified as positive (TP = True
Positive) and some incorrectly classified as negative (FN = False Negative fraction).
Conversely, some data will be correctly classified as negative (TN = True Negative),
but some incorrectly classified as positive (FP = False Positive). Plotting TPR against
FPR generates a curve in which each point represents a sensitivity/specificity pair
corresponding to a particular threshold. The area under the ROC curve (AUC) is a
measure of accuracy, in the sense of the ability of an algorithm to distinguish be-
tween two classes or groups.
3.2.2 Accuracy and related measures
In the context of segmentation, for example, comparing the output of a computer
algorithm to the ground truth generated by an expert human grader, accuracy (Acc)
is often assessed by summing the number of correctly identified image pixels—those
belonging to a region (i.e., TP) and those external to the region (i.e., TN) and express-
ing as a fraction of the total number of pixels, P, in the image, Acc = (TP + TN)/P.
However, this can sometimes be misleading if there are a disproportionate number
of pixels belonging to the region. For instance, when segmenting pixels as depict-
ing blood vessels in a retinal image, there may be more than 10 times vessel pixels
than non-vessel pixels. As long as the majority of non-vessel pixels are correctly