Page 207 - Classification Parameter Estimation & State Estimation An Engg Approach Using MATLAB
P. 207
196 FEATURE EXTRACTION AND SELECTION
F(N) consisting of D < N elements taken from z. For each element y d
there exists an element z n such that y d ¼ z n . The number of distinguish-
able subsets for a given D is:
N N!
qðDÞ¼ ¼ ð6:28Þ
D ðN DÞ!D!
This quantity expresses the number of different combinations that can
be made from N elements, each combination containing D elements and
with no two combinations containing exactly the same D elements. We
will adopt an enumeration of the index j according to: j ¼ 1, .. . , q(D).
In Section 6.1 the concept of a performance measure has been intro-
duced. These performance measures evaluate the appropriateness of a
measurement vector for the classification task. Let J(F j (D)) be a perform-
ance measure related to the subset F j (D). The particular choice of
J(:) depends on the problem at hand. For instance, in a multi-class
problem, the interclass/intraclass distance J INTER=INTRA (:) could be useful;
see (6.10).
^
F
The goal of feature selection is to find a subset F(D) with dimension D
such that this subset outperforms all other subsets with dimension D:
^
F FðDÞ¼ F i ðDÞ with: JðF i ðDÞÞ JðF j ðDÞÞ for all j 2f1; .. . ; qðDÞg
ð6:29Þ
An exhaustive search for this particular subset would solve the problem of
feature selection. However, there is one practical problem. How to accom-
plish the search process? An exhaustive search requires q(D) evaluations of
J(F j (D)). Even in a simple classification problem, this number is enormous.
Forinstance, letusassume that N ¼ 20 and D ¼ 10. This gives about
5
2 10 different subsets. A doubling of the dimension to N ¼ 40 requires
9
about 10 evaluations of the criterion. Needless to say that even in problems
with moderate complexity, an exhaustive search is out of the question.
Obviously, a more efficient search strategy is needed. For this, many
options exist, but most of them are suboptimal (in the sense that they
cannot guarantee that the subset with the best performance will be
found). In the remaining part of this section, we will first consider a
search strategy called ’branch-and-bound’. This strategy is one of the
few that guarantees optimality (under some assumptions). Next, we
continue with some of the much faster, but suboptimal strategies.