Page 228 - Classification Parameter Estimation & State Estimation An Engg Approach Using MATLAB
P. 228

FEATURE REDUCTION                                            217

            The D   N matrix W D transforms the N-dimensional measurement
            space to a D-dimensional feature space. Ideally, the transform is such
            that y is a good representation of z despite of the lower dimension of y.
            This objective is strived for by selecting W D such that an (unbiased)
                                1
                                 z
            linear MMSE estimate ^ z lMMSE for z based on y yields a minimum mean
            square error (see Section 3.1.5):
                              n  h               io
                                                2
                                   z k
                 W D ¼ arg min E ^ z lMMSE ðyÞ  zk   with   y ¼ Wz      ð7:3Þ
                          W
            It is easy to see that this objective function does not provide a unique
            solution for W D . If a minimum is reached for some W D , then any matrix
            AW D is another solution with the same minimum (provided that A is
            invertible) as the transformation A will be inverted by the linear MMSE
            procedure. For uniqueness, we add two requirements. First, we require that
            the information carried in the individual elements of y add up individually.
            With that we mean that if y is the optimal D dimensional representation of
            z, then the optimal D   1 dimensional representation is obtained from y,
            simply by deleting its least informative element. Usually, the elements of y
            are sorted in decreasing order of importance, so that the least informative
            element is always the last element. With this convention, the matrix W D 1
            is obtained from W D simply by deleting the last row of W D .
              The requirement leads to the conclusion that the elements of y must be
            uncorrelated. If not, then the least informative element would still carry
            predictive information about the other elements of y which conflicts
            with our requirement. Hence, the covariance matrix C y of y must be a
            diagonal matrix, say  .If C z is the covariance matrix of z, then:

                                               T
                                  C y ¼ W D C z W ¼   D                 ð7:4Þ
                                               D
                                        T
                                               T
            For D ¼ N it follows that C z W ¼ W   N because W N is an invertible
                                        N      N
                                                     T
            matrix (in fact, an orthogonal matrix) and W W N must be a diagonal
                                                     N
            matrix (see Appendix B.5 and C.3.2). As   N is a diagonal matrix, the
                        T
            columns of W must be eigenvectors of C z . The diagonal elements of   N
                        N
            are the corresponding eigenvalues.
              The solution is still not unique because each element of y can be scaled
            individually without changing the minimum. Therefore, the second
            requirement is that each column of W T  has unit length. Since the
                                                 N
            1
             Since z and y are zero mean, the unbiased linear MMSE estimator coincides with the linear
            MMSE estimator.
   223   224   225   226   227   228   229   230   231   232   233