Page 265 - Applied Statistics Using SPSS, STATISTICA, MATLAB and R
P. 265

246      6 Statistical Classification


              Both standard deviations, which can be inspected in text boxes for a selected
           value of n/d, are initially high for low values of n and converge slowly to zero with
            n  →  ∞ . For the situation shown in Figure 6.15, the standard deviation of  eP ˆ  d  () n
           changes from 0.089 for n = d (14 cases, 7 per class) to 0.033 for n = 10d (140
           cases, 70 per class).
              Based on the behaviour of the Ε[ eP ˆ  d  ( ) n ] and Ε[ eP ˆ  t  ( ) n ] curves, some criteria
           can be established for the dimensionality ratio. As a general rule of thumb, using
           dimensionality ratios well above 3 is recommended.
              If the cases are not equally distributed by the classes, it is advisable to use the
           smaller number  of cases  per class as  value of  n.  Notice also that a multi-class
           problem can be seen as a generalisation of a two-class problem if every class is
           well separated from all the others.  Then, the total number of  needed training
           samples for a given deviation of the expected error estimates from the Bayes error
                              *
                                      *
           can be estimated as cn , where n  is the particular value of n that achieves such a
           deviation in the most unfavourable, two-class  dichotomy of the multi-class
           problem.


           6.4  The ROC Curve

           The classifiers presented in the previous sections assumed a certain model of the
           feature vector distributions in the feature  space.  Other model-free techniques to
           design classifiers do not make assumptions about the underlying data distributions.
           They are called non-parametric  methods. One of these methods is based on the
           choice of appropriate feature thresholds by means of the ROC curve method (where
           ROC stands for Receiver Operating Characteristic).
              The ROC curve method  (available with SPSS; see Commands 6.2) appeared in
           the fifties as a means of selecting the best voltage threshold discriminating pure
           noise from signal plus noise, in signal detection applications such as radar. Since
           the seventies, the concept has been used in the areas of medicine and psychology,
           namely for test assessment purposes.
              The ROC curve is an interesting analysis tool for two-class problems, especially
           in situations where one wants to detect rarely occurring events such as a special
           signal, a disease, etc., based on the choice of feature thresholds. Let us call the
           absence of the event the normal situation (N) and the occurrence of the rare event
           the abnormal situation (A).  Figure 6.16 shows the classification matrix for this
           situation, based  on a  given  decision  rule,  with true classes along the  rows and
                                                      5
           decided (predicted) classifications along the columns .






           5
             The reader  may notice the similarity of the canonical two-class classification  matrix with the
             hypothesis decision matrix in chapter 4 (Figure 4.2).
   260   261   262   263   264   265   266   267   268   269   270