Page 216 -
P. 216

204    5 Neural Networks

                                 Consider  the  CTG  dataset  with  10  classes.  This  constitutes  a  difficult
                               classification  problem,  given  the  proximity  and  overlap  of  the  classes.  By
                               performing Kruskal-Wallis tests and observing box-whiskers plots of the data, it is
                               possible to get some insight about the discriminative capability of the features. It is
                               possible, namely, to discard features that  do not  contribute to the  discrimination
                               (FM, DS, DP, MODE, NZER) and determine the ones that are more discriminative
                               (A, DL, V, MSTV, ALTV, WIDTH, DP, all with Kruskal Wallis H>1000). Factor
                               analysis is also helpful in providing a picture of the extent of class overlap and the
                               main  features describing the data (see Exercises 2.12 and  3.8). The difficulty of
                               this problem can be seen in Figure 5.38.
                                 Using the  Statistics intelligent problem solver, an  MLP18:22:10 solution was
                               obtained with  good performance. Afterwards, this was trained with  the conjugate
                               gradient method and the correct classifications of 9070, 83% and 85.2% were found
                               for  the  training  set  (1063 cases),  verification  set  (531  cases)  and  test  set  (532
                               cases), respectively. As a matter of fact, the classification error differences for the
                               three sets were mainly due to class C, represented by few cases in all sets. Without
                               class  C patterns the  error  rates  for  the three  sets were quite  similar, making it
                               acceptable to merge the results for the three different sets, in  order to obtain an
                               overall classification matrix as shown in Table 5.7.



                               Table  5.7.  Classification  matrix  of  CTG  classes,  with  the  conjugate-gradient
                               method.  True  classifications along  the  rows;  predicted  classifications  along  the
                               columns.


                                   A     B      C      D     SH     AD    DE     LD     FS    SUSP

























                                 Notice  the  surprisingly  high  degree  of  performance  (overall  error  of  87%)
                               obtained with an MLP for this hard classification problem, with the exception of
   211   212   213   214   215   216   217   218   219   220   221