Page 129 -
P. 129

116    4 Statistical Classification


                                 Performing edition  in  half  of  the samples, the best result  was obtained  for k=l,
                                 with  18% overall error rate. The edition process kept 22 out of 25 patterns from the
                                 first class and 23 out of  25  from the second class, in  the training set. Using the
                                 edition  method  many  important  borderline  patterns  were  discarded,  which
                                 contributed to a degradation of the performance.
                                   In spite of the difficulties posed by the k-NN method it can still be an interesting
                                 model-free technique for application in some situations. We will see in chapter five
                                 how  some  ideas  of  this  method  are  incorporated  in  certain  neural  network
                                 approaches.


                                 4.3.3 The ROC Curve

                                 The concept of  a Receiver Operating Characteristic curve, popularly named ROC
                                 curve,  appeared  in  the  fifties as a means of  selecting  the  best  voltage threshold
                                 discriminating pure noise from signal  plus  noise, in  signal detection applications
                                 such  as  radar.  Since  the  seventies,  the  concept  has  been  used  in  the  areas of
                                 medicine and psychology, namely for diagnostic test assessment purposes.
                                   The ROC curve is an interesting analysis tool in two-class problems, especially
                                 in  situations where one wants to detect rarely occurring events such as a signal, a
                                 disease, etc. Let us call the absence of the event the normal situation (N) and the
                                 occurrence  of  the rare event the abnormal  situation (A). Figure 4.31  shows the
                                 classification matrix for this situation, with true classes along the rows and decided
                                 (predicted) classifications along the columns.



                                                                  Decision










                                 Figure 4.31. The canonical classification matrix for two-class discrimination of an
                                 abnormal event (A) from the normal event (N).




                                   From  the  classification  matrix  of  Figure  4.31,  the  following  parameters  are
                                 defined:
                                 - True Positive Ratio s TPR = al(a+b). Also known as sensitivity, this parameter
                                    tells us how sensitive our decision method is in  the detection of  the abnormal
   124   125   126   127   128   129   130   131   132   133   134