Page 196 - Classification Parameter Estimation & State Estimation An Engg Approach Using MATLAB
P. 196
CRITERIA FOR SELECTION AND EXTRACTION 185
an alternative strategy will be discussed: the reduction of the dimension
of the measurement vector. An additional advantage of this strategy is
that it automatically reduces the computational complexity.
For the reduction of the measurement space, two different approaches
exist. One is to discard certain elements of the vector and to select the
ones that remain. This type of reduction is feature selection. It is dis-
cussed in Section 6.2. The other approach is feature extraction. Here, the
selection of elements takes place in a transformed measurement space.
Section 6.3 addresses the problem of how to find suitable transforms.
Both methods rely on the availability of optimization criteria. These are
discussed in Section 6.1.
6.1 CRITERIA FOR SELECTION AND EXTRACTION
The first step in the design of optimal feature selectors and feature
extractors is to define a quantitative criterion that expresses how well
such a selector or extractor performs. The second step is to do the actual
optimization, i.e. to use that criterion to find the selector/extractor that
performs best. Such an optimization can be performed either analytically
or numerically.
Within a Bayesian framework ‘best’ means the one with minimal risk.
Often, the cost of misclassification is difficult to assess, or even fully
unknown. Therefore, as an optimization criterion the risk is often
replaced by the error rate E. Techniques to assess the error rate empiric-
ally by means of a validation set are discussed in Section 5.4. However,
in this section we need to be able to manipulate the criterion mathemat-
ically. Unfortunately, the mathematical structure of the error rate is
complex. The current section introduces some alternative, approximate
criteria that are simple enough for a mathematical treatment.
In feature selection and feature extraction, these simple criteria are
used as alternative performance measures. Preferably, such performance
measures have the following properties:
. The measure increases as the average distance between the expecta-
tion vectors of different classes increases. This property is based
on the assumption that the class information of a measurement
vector is mainly in the differences between the class-dependent
expectations.
. The measure decreases with increasing noise scattering. This prop-
erty is based on the assumption that the noise on a measurement