Page 60 -
P. 60
46 2 Pattern Discrimination
for the H values. For the cork stopper features all p values are zero, therefore Table
2.1 lists only the H values from the most discriminative feature. ART, to the less
discriminative, N.
Table 2.1. Cork stoppers features in descending order of Kruskal-Wallis'H (three
classes).
Feature H
ART 121.6
PRTM 117.6
PKT 115.7
ARTG 115 7
ARTM 1 11.5
PUG 113.3
RA 105.2
NG 104.4
RN 94.3
N 74.5
2.6 The Dimensionality Ratio Problem
In section 2.1.1 we saw that a complex decision surface in a low dimensionality
space can be made linear and therefore much simpler, in a higher dimensionality
space. Let us take another look at formula (2-14a). It is obvious that by adding
more features one can only increase the distance of a pattern to a class mean.
Therefore, it seems that working in high dimensionality feature spaces, or
equivalently, using arbitrarily complex decision surfaces (e.g. Figure 2.5), can only
increase the classification or regression performance.
However, this expected performance increase is not verified in practice. The
reason for this counterintuitive result, related to the reliable estimation of classifier
or regressor parameters, is one of the forms of what is generally known as the
curse qf dirnensionalip.
In order to get some insight into what is happening, let us assume that we are
confronted with a two-class classification problem whose data collection
progresses slowly. Initially we only have available n=6 patterns for each class. The
patterns are represented in a d=2 feature space as shown in Figure 2.23. The two
classes seem linearly separable with 0% error. Fine!
Meanwhile, a few more cases were gathered and we now have available n=IO
patterns. The corresponding scatter plot is shown in Figure 2.24. Hum! ... The
situation is getting tougher. It seems that, after all, the classes were not so
separable as we had imagined .... However, a quadratic decision function still seems
to provide a solution ...