Page 215 - Classification Parameter Estimation & State Estimation An Engg Approach Using MATLAB
P. 215

204                         FEATURE EXTRACTION AND SELECTION

              Linear feature extraction may also improve the ability to generalize. If, in
            the Gaussian case with unequal covariance matrices, the number of sam-
            ples in the training set is in the same order as that of the number
                            2
            of parameters, KN , overfitting is likely to occur. But linear feature extrac-
            tion – provided that D << N – helps to improve the generalization ability.
              We assume the availability of a training set and a suitable performance
            measure J(). The design of a feature extraction method boils down to
            finding the matrix W that – for the given training set – optimizes the
            performance measure.
              The performance measure of a feature vector y ¼ Wz is denoted by
            J(y)or J(Wz). With this notation, the optimal feature extraction is:


                                  W ¼ argmaxfJðWzÞg                    ð6:32Þ
                                          W
            Under the condition that J(Wz) is continuously differentiable in W, the
            solution of (6.32) must satisfy:

                                       qJðWzÞ
                                              ¼ 0                      ð6:33Þ
                                         qW
            Finding a solution of either (6.32) or (6.33) gives us the optimal linear
            feature extraction. The search can be accomplished numerically using
            the training set. Alternatively, the search can also be done analytically
            assuming parameterized conditional densities. Substitution of estimated
            parameters (using the training set) gives the matrix W.
              In the remaining part of this section, the last approach is worked out
            for two particular cases: feature extraction for two-class problems with
            Gaussian densities and feature extraction for multi-class problems based
            on the inter/intra distance measure. The former case will be based on the
            Bhattacharyya distance.



            6.3.1  Feature extraction based on the Bhattacharyya distance
                   with Gaussian distributions

            In the two-class case with Gaussian conditional densities a suitable
            performance measure is the Bhattacharyya distance. In equation (6.19)
            J BHAT  implicitly gives the Bhattacharyya distance as a function of the
            parameters of the Gaussian densities of the measurement vector z. These
            parameters are the conditional expectations m and covariance matrices
                                                     k
            C k . Substitution of y ¼ Wz gives the expectation vectors and covariance
   210   211   212   213   214   215   216   217   218   219   220