Page 229 - Classification Parameter Estimation & State Estimation An Engg Approach Using MATLAB
P. 229

218                                     UNSUPERVISED LEARNING

                                                                        T
            eigenvectors are orthogonal, this requirement is fulfilled by W N W ¼ I
                                                                        N
            with I the N   N unit matrix. With that, W N establishes a rotation on z.
            The rows of the matrix W N , i.e. the eigenvectors, must be sorted such
            that the eigenvalues form a non-ascending sequence. For arbitrary D, the
            matrix W D is constructed from W N by deleting the last N   D rows.
              The interpretation of this is as follows (see Figure 7.1). The operator
            W N performs a rotation on z such that its orthonormal basis aligns with
            the principal axes of the ellipsoid associated with the covariance matrix
            of z. The coefficients of this new representation of z are called the
            principal components. The axes of the ellipsoid point in the principal
            directions. The MMSE approximation of z using only D coefficients
            is obtained by nullifying the principal components with least variances.
            Hence, if the principal components are ordered according to their
            variances, the elements of y are formed by the first D principal compon-
            ents. The linear MMSE estimate is:

                                                   T
                                            T
                               ^ z lMMSE ðyÞ¼ W y ¼ W W D z:
                               z
                                                   D
                                            D
            PCA can be used as a first step to reduce the dimension of the measure-
            ment space. In practice, the covariance matrix is often replaced by the
            sample covariance estimated from a training set. See Section 5.2.3.
              Unfortunately, PCA can be counter-productive for classification and
            estimation problems. The PCA criterion selects a subspace of the feature
            space, such that the variance of z is conserved as much as possible.
            However, this is done regardless of the classes. A subspace with large
            variance is not necessarily one in which classes are well separated.



                       z 1
                                                    √λ  1
                                 y 1
                                               y 0







                                           √λ  0
                                                   : original vector
                            z                      : reconstructed from y 0

                                                   z 0
            Figure 7.1  Principal component analysis
   224   225   226   227   228   229   230   231   232   233   234