Page 216 -
P. 216
204 5 Neural Networks
Consider the CTG dataset with 10 classes. This constitutes a difficult
classification problem, given the proximity and overlap of the classes. By
performing Kruskal-Wallis tests and observing box-whiskers plots of the data, it is
possible to get some insight about the discriminative capability of the features. It is
possible, namely, to discard features that do not contribute to the discrimination
(FM, DS, DP, MODE, NZER) and determine the ones that are more discriminative
(A, DL, V, MSTV, ALTV, WIDTH, DP, all with Kruskal Wallis H>1000). Factor
analysis is also helpful in providing a picture of the extent of class overlap and the
main features describing the data (see Exercises 2.12 and 3.8). The difficulty of
this problem can be seen in Figure 5.38.
Using the Statistics intelligent problem solver, an MLP18:22:10 solution was
obtained with good performance. Afterwards, this was trained with the conjugate
gradient method and the correct classifications of 9070, 83% and 85.2% were found
for the training set (1063 cases), verification set (531 cases) and test set (532
cases), respectively. As a matter of fact, the classification error differences for the
three sets were mainly due to class C, represented by few cases in all sets. Without
class C patterns the error rates for the three sets were quite similar, making it
acceptable to merge the results for the three different sets, in order to obtain an
overall classification matrix as shown in Table 5.7.
Table 5.7. Classification matrix of CTG classes, with the conjugate-gradient
method. True classifications along the rows; predicted classifications along the
columns.
A B C D SH AD DE LD FS SUSP
Notice the surprisingly high degree of performance (overall error of 87%)
obtained with an MLP for this hard classification problem, with the exception of