Page 215 - Classification Parameter Estimation & State Estimation An Engg Approach Using MATLAB
P. 215
204 FEATURE EXTRACTION AND SELECTION
Linear feature extraction may also improve the ability to generalize. If, in
the Gaussian case with unequal covariance matrices, the number of sam-
ples in the training set is in the same order as that of the number
2
of parameters, KN , overfitting is likely to occur. But linear feature extrac-
tion – provided that D << N – helps to improve the generalization ability.
We assume the availability of a training set and a suitable performance
measure J(). The design of a feature extraction method boils down to
finding the matrix W that – for the given training set – optimizes the
performance measure.
The performance measure of a feature vector y ¼ Wz is denoted by
J(y)or J(Wz). With this notation, the optimal feature extraction is:
W ¼ argmaxfJðWzÞg ð6:32Þ
W
Under the condition that J(Wz) is continuously differentiable in W, the
solution of (6.32) must satisfy:
qJðWzÞ
¼ 0 ð6:33Þ
qW
Finding a solution of either (6.32) or (6.33) gives us the optimal linear
feature extraction. The search can be accomplished numerically using
the training set. Alternatively, the search can also be done analytically
assuming parameterized conditional densities. Substitution of estimated
parameters (using the training set) gives the matrix W.
In the remaining part of this section, the last approach is worked out
for two particular cases: feature extraction for two-class problems with
Gaussian densities and feature extraction for multi-class problems based
on the inter/intra distance measure. The former case will be based on the
Bhattacharyya distance.
6.3.1 Feature extraction based on the Bhattacharyya distance
with Gaussian distributions
In the two-class case with Gaussian conditional densities a suitable
performance measure is the Bhattacharyya distance. In equation (6.19)
J BHAT implicitly gives the Bhattacharyya distance as a function of the
parameters of the Gaussian densities of the measurement vector z. These
parameters are the conditional expectations m and covariance matrices
k
C k . Substitution of y ¼ Wz gives the expectation vectors and covariance