Page 227 - Classification Parameter Estimation & State Estimation An Engg Approach Using MATLAB
P. 227

216                                     UNSUPERVISED LEARNING

            chapter we will discuss two main characteristics which can be explored:
            the subspace structure of data and its clustering characteristics. The first
            tries to summarize the objects using a smaller number of features than
            the original number of measurements; the second tries to summarize the
            data set using a smaller number of objects than the original number.
            Subspace structure is often interesting for visualization purposes. The
            human visual system is highly capable of finding and interpreting struc-
            ture in 2D and 3D graphical representations of data. When higher
            dimensional data is available, a transformation to 2D or 3D might
            facilitate its interpretation by humans. Clustering serves a similar pur-
            pose, interpretation, but also data reduction. When very large amounts
            of data are available, it is often more efficient to work with cluster
            representatives instead of the whole data set. In Section 7.1 we will treat
            feature reduction, in Section 7.2 we discuss clustering.



            7.1   FEATURE REDUCTION

            The most popular unsupervised feature reduction method is principal
            component analysis (Jolliffe, 1986). This will be discussed in Section
            7.1.1. One of the drawbacks of this method is that it is a linear method,
            so nonlinear structures in the data cannot be modelled. In Section 7.1.2
            multi-dimensional scaling is introduced, which is a nonlinear feature
            reduction method.




            7.1.1  Principal component analysis

            The purpose of principal component analysis (PCA) is to transform a
            high dimensional measurement vector z to a much lower dimensional
            feature vector y by means of an operation:

                                                 z
                                     y ¼ W D ðz     zÞ                  ð7:1Þ
            such that z can be reconstructed accurately from y.
                z is the expectation of the random vector z. It is constant for all realiza-
              z
            tions of z. Without loss of generality, we can assume that   z ¼ 0 because we
                                                             z
                                                              z
                                                       z
            can always introduce a new measurement vector, ~ z ¼ z     z, and apply the
            analysis to this vector. Hence, under that assumption, we have:
                                        y ¼ W D z                       ð7:2Þ
   222   223   224   225   226   227   228   229   230   231   232