Page 168 - Computational Retinal Image Analysis
P. 168

3  Tools and techniques  163




                  identified the accuracy will always be high, even if the identification of vessel pix-
                  els is poor. The Jaccard Coefficient (JC) offers an alternative and is expressed as
                  JC = TP/(TP + FP + FN). If the vessel segmentation by an algorithm matches exactly
                  with the ground truth then JC is one; if there is no overlap JC is 0.

                  3.2.3   Confusion matrices
                  For our purposes, a confusion matrix captures the performance of a classifier by
                  showing the number of times the program and an annotator, or two annotators, make
                  any possible pair of joint decisions. The annotator list is the same on both rows and
                  columns. As a simple example, consider two annotators asked to grade the tortuos-
                  ity of a set of, say, 30 vessels on a 3-point scale, in order to validate a program as-
                  sessing vessel tortuosity. The data set contains 10 vessels per tortuosity level. The
                  following confusion matrices might result from experiments, where O1, O2 indicate
                  the observers, P the program, and Lk the tortuosity level. The entries can of course
                  be expressed also as percentages, e.g., in our case, 10 (0.33%), 8 (27%), 3 (10%),
                  and so on.


                                O2
                   O1                         L1           L2            L3
                                L1            8            2             1
                                L2            2            7             1
                                L3            0            0             9
                                P
                   O1                         L1           L2            L3
                                L1            6            2             3
                                L2            2            6             1
                                L3            1            2             7
                                P
                   O2                         L1           L2            L3
                                L1            10           1             0
                                L2            1            9             0
                                L3            0            0             9

                     We can see that the program agrees very well with observer O2, but less well
                  with observer O1. We can also see that the observers do not agree perfectly with each
                  other on the classification of level 1 and 2 (two vessels are labeled L1 by O1 but L2
                  by O2, and vice versa). Notice that this disagreement defines the best performance
                  we can hope to achieve meaningfully by the program, given our annotators and data
                  set. See Gwet [18] for a discussion on confusion matrices, Kappa coefficient and
                  related measures.
   163   164   165   166   167   168   169   170   171   172   173