Page 269 - Applied Statistics Using SPSS, STATISTICA, MATLAB and R

P. 269

250 6 Statistical Classification

There is a compromise to be made between sensitivity and specificity. This
compromise is made more patent in the ROC curve, which was obtained with
SPSS, and corresponds to eight different threshold values, as shown in Figure
6.19a (using the Data worksheet of Signal & Noise.xls ). Notice that
given the limited number of threshold values, the ROC curve has a stepwise aspect,
with different values of the FPR corresponding to the same sensitivity, as also
appearing in Table 6.10 for the sensitivity value of 0.7. With a large number of
signal samples and threshold values, one would obtain a smooth ROC curve, as
represented in Figure 6.19b.

Looking at the ROC curves shown in Figure 6.19 the following characteristic
aspects are clearly visible:

− The ROC curve graphically depicts the compromise between sensitivity and
specificity. If the sensitivity increases, the specificity decreases, and vice-
versa.
− All ROC curves start at (0,0) and end at (1,1) (see Exercise 6.7).
− A perfectly discriminating method corresponds to the point (0,1). The ROC
curve is then a horizontal line at a sensitivity =1.

A non-informative ROC curve corresponds to the diagonal line of Figures 6.19,
with sensitivity = 1 – specificity. In this case, the true detection rate of the
abnormal situation is the same as the false detection rate. The best compromise
decision of sensitivity = specificity = 0.5 is then just as good as flipping a coin.

Table 6.10. Sensitivity and specificity in impulse detection (100 signal values).

Threshold Sensitivity Specificity
1 0.90 0.66
2 0.80 0.80
3 0.70 0.87
4 0.70 0.93

One of the uses of the ROC curve is related to the issue of choosing the best
decision threshold that can differentiate both situations; in the case of Example
6.10, the presence of the impulses from the presence of the noise alone. Let us
address this discriminating issue as a cost decision issue as we have done in section
6.3.1. Representing the sensitivity and specificity of the method for a threshold ∆
by s(∆) and f(∆) respectively, and using the same notation as in formula 6.20, we
can write the total risk as:

R = λ aa P (A )s (∆ ) + λ an P (A )( 1− s (∆ )) + λ na P (N ) f (∆ ) + λ nn P (N )( 1− f (∆ )) ,
or, R = s (∆ ( ) λ aa P (A ) − λ an P (A ) ) + f (∆ ( ) λ na P (N ) − λ nn P (N ) ) constant+ .

264 265 266 267 268 269 270 271 272 273 274