Page 167 - Computational Retinal Image Analysis
P. 167

162    CHAPTER 9  Validation




                            In an effective interdisciplinary collaboration on RIA/MIA, it is important for com-
                         puter scientists to realize that clinicians will find it difficult to see the value of an appli-
                         cation tested only on technical criteria. It is similarly important for clinicians to realize
                         that computer scientists will aim to achieve high performance first (Section 3.2), but are
                         likely to have to learn how to create a clinically interesting data set.

                         3.2  Direct techniques: Focus on the image processing task

                         We review briefly the performance assessment techniques that we consider an essen-
                         tial toolkit for RIA/MIA. We do not aim to provide a complete tutorial, only to list
                         the techniques that we regard as essential.
                            The purpose of direct techniques is to compare quantitatively the output of a pro-
                         gram (e.g., contours of regions, labels for images or parts thereof) with annotations
                         given in the same format as the program output. Notice that the latter excludes, at this
                         stage, validation on outcome (Section 3.3).


                         3.2.1   Receiver operating characteristic (ROC) curves
                         A receiver operating characteristic curve, or ROC curve [19], is a plot that demon-
                         strates the performance of a test to discriminate between two classes compared to a
                         gold standard (e.g., a computer generated segmentation vs a hand-drawn segmenta-
                         tion by an expert human grader) or cases (e.g., separating disease cases from normal
                         ones). It is created by plotting the true positive rate (TPR), or Sensitivity, against
                         the false positive rate (FPR), i.e., 1-Specificity, for different threshold settings of
                         a parameter. For every possible parameter value selected to discriminate between
                         two classes or cases, some data will be correctly classified as positive (TP = True
                         Positive) and some incorrectly classified as negative (FN = False Negative fraction).
                         Conversely, some data will be correctly classified as negative (TN = True Negative),
                         but some incorrectly classified as positive (FP = False Positive). Plotting TPR against
                         FPR generates a curve in which each point represents a sensitivity/specificity pair
                         corresponding to a particular threshold. The area under the ROC curve (AUC) is a
                         measure of accuracy, in the sense of the ability of an algorithm to distinguish be-
                         tween two classes or groups.


                         3.2.2   Accuracy and related measures
                         In the context of segmentation, for example, comparing the output of a computer
                         algorithm to the ground truth generated by an expert human grader, accuracy (Acc)
                         is often assessed by summing the number of correctly identified image pixels—those
                         belonging to a region (i.e., TP) and those external to the region (i.e., TN) and express-
                         ing as a fraction of the total number of pixels, P, in the image, Acc = (TP + TN)/P.
                         However, this can sometimes be misleading if there are a disproportionate number
                         of pixels belonging to the region. For instance, when segmenting pixels as depict-
                         ing blood vessels in a retinal image, there may be more than 10 times vessel pixels
                         than non-vessel pixels. As long as the majority of non-vessel pixels are correctly
   162   163   164   165   166   167   168   169   170   171   172