Page 279 - Applied Statistics Using SPSS, STATISTICA, MATLAB and R
P. 279

260      6 Statistical Classification


              At each stage of the tree classifier, a simpler problem with a smaller number of
           features is solved. This is an additional  benefit, namely  in practical multi-class
           problems where it is rather difficult to  guarantee normal or even  symmetric
           distributions with similar  covariance matrices for all classes, but  it  may be
           possible, with the multistage approach, that those conditions are approximately met
           at each stage, affording then optimal classifiers.

           Example 6.16

           Q: Consider the Breast Tissue   dataset (electric impedance measurements of
           freshly  excised breast  tissue) with 6  classes denoted  CAR (carcinoma), FAD
           (fibro-adenoma),  GLA  (glandular), MAS (mastopathy), CON  (connective) and
           ADI (adipose). Derive a decision tree solution for this classification problem.
           A: Performing a Kruskal-Wallis analysis, it is readily seen that all the features have
           discriminative capabilities, namely I0 and PA500, and that it is  practically
           impossible to discriminate between classes GLA, FAD and MAS. The low
           dimensionality ratio of this dataset for the individual classes (e.g. only 14 cases for
           class CON) strongly recommends a decision tree approach, with the use of merged
           classes and a greatly reduced number of features at each node.
              As I0 and PA500 are  promising features, it is worthwhile to look at the
           respective scatter diagram shown in Figure 6.23. Two case clusters are visually
           identified:  one corresponding to {CON,  ADI}, the  other to {MAS,  GLA,  FAD,
           CAR}.  At the first stage  of the tree we then use I0 alone,  with a threshold  of
           I0 = 600, achieving zero errors.
              At stage two, we attempt the most useful discrimination from the medical point
           of  view: class CAR (carcinoma) vs. {FAD, MAS,  GLA}.  Using discriminant
           analysis, this can be performed with an overall training set error of about 8%, using
           features AREA_DA and IPMAX,  whose  distributions are well modelled by the
           normal distribution.


                      0.40
                                                              CLASS: car
                      0.35                                    CLASS: fad
                                                              CLASS: mas
                      0.30
                                                              CLASS: gla
                                                              CLASS: con
                      0.25
                                                              CLASS: adi
                      0.20
                    PA500  0.15
                      0.10
                      0.05
                      0.00
                     -0.05
                       -200    300    800    1300   1800   2300   2800
                                              I0
           Figure 6.23. Scatter plot of six classes of breast tissue with features I0 and PA500.
   274   275   276   277   278   279   280   281   282   283   284