Page 129 -
P. 129
116 4 Statistical Classification
Performing edition in half of the samples, the best result was obtained for k=l,
with 18% overall error rate. The edition process kept 22 out of 25 patterns from the
first class and 23 out of 25 from the second class, in the training set. Using the
edition method many important borderline patterns were discarded, which
contributed to a degradation of the performance.
In spite of the difficulties posed by the k-NN method it can still be an interesting
model-free technique for application in some situations. We will see in chapter five
how some ideas of this method are incorporated in certain neural network
approaches.
4.3.3 The ROC Curve
The concept of a Receiver Operating Characteristic curve, popularly named ROC
curve, appeared in the fifties as a means of selecting the best voltage threshold
discriminating pure noise from signal plus noise, in signal detection applications
such as radar. Since the seventies, the concept has been used in the areas of
medicine and psychology, namely for diagnostic test assessment purposes.
The ROC curve is an interesting analysis tool in two-class problems, especially
in situations where one wants to detect rarely occurring events such as a signal, a
disease, etc. Let us call the absence of the event the normal situation (N) and the
occurrence of the rare event the abnormal situation (A). Figure 4.31 shows the
classification matrix for this situation, with true classes along the rows and decided
(predicted) classifications along the columns.
Decision
Figure 4.31. The canonical classification matrix for two-class discrimination of an
abnormal event (A) from the normal event (N).
From the classification matrix of Figure 4.31, the following parameters are
defined:
- True Positive Ratio s TPR = al(a+b). Also known as sensitivity, this parameter
tells us how sensitive our decision method is in the detection of the abnormal