Page 280 - Applied Statistics Using SPSS, STATISTICA, MATLAB and R
P. 280

6.7 Tree Classifiers   261


              Figure  6.24 shows the corresponding linear discriminant. Performing two
           randomised runs using the  partition method  in halves (i.e.,  the 2-fold  cross-
           validation with half of the samples for design and the other half for testing), an
           average test set error of 8.6% was obtained, quite near the design set error. At stage
           two, the  discrimination CON vs.  ADI can also be  performed with feature  I0
           (threshold I0 =1550), with zero errors for ADI and 14% errors for CON.
              With these results, we can establish the decision tree shown in Figure 6.25. At
           each level of the decision tree, a decision function is used, shown in Figure 6.25 as
           a  decision rule to  be satisfied. The left descendent tree branch corresponds to
           compliance with a rule, i.e., to a “Yes” answer; the right descendent tree branch
           corresponds to a “No” answer.
              Since a small number of features is used at each level, one for the first level and
           two for the second level, respectively, we maintain a reasonably high
           dimensionality ratio at both levels; therefore, we obtain reliable estimates of the
           errors with narrow 95% confidence intervals (less than 2% for the first level and
           about 3% for the CAR vs. {FAD, MAS, GLA} level).


                      120

                      100

                       80
                     IPMAX  60


                       40
                                                                not car
                       20                                       car

                       0
                        -5      5       15      25      35      45
                                           AREA_DA
           Figure 6.24. Scatter plot of breast tissue classes CAR and {MAS, GLA, FAD}
           (denoted not car  ) using features AREA_DA and IPMAX, showing the linear
           discriminant separating the two classes.


              For comparison purposes, the same four-class discrimination was carried out
           with only one linear classifier using the same three features I0, AREA_DA and
           IPMAX as in the hierarchical approach. Figure 6.26  shows the classification
           matrix. Given that the  distributions are roughly symmetric, although  with some
           deviations in the covariance  matrices, the optimal error achieved  with linear
           discriminants should be close to what is shown in the classification matrix. The
           degraded performance compared with the decision tree approach is evident.
              On the other hand, if our only interest is to discriminate class car from all other
           ones, a linear classifier with only one feature can achieve this discrimination with a
   275   276   277   278   279   280   281   282   283   284   285