Page 259 - Classification Parameter Estimation & State Estimation An Engg Approach Using MATLAB
P. 259

248                                     UNSUPERVISED LEARNING

            the membership of z n with respect to the k-th cluster. After proper
            initialization, the EM algorithm proceeds as follows:

            Algorithm 7.6: EM algorithm for training a GTM:

            1. E step: estimate the missing data:


                                                        ^
                                                            2
                                                        W


                                       ^
                                                      k
                                          2

                                      W
                           x x n;k ¼ pðq jz n ; W; ^  Þ¼  pðz n jq ; W; ^ Þ  ð7:41Þ
                                  k             K
                                                         W


                                               P          ^   2
                                                  pðz n jq ; W; ^ Þ
                                                        k
                                               k¼1
               using Bayes’ theorem.
            2. M step: re-estimate the parameters:
                                                     T
                              ^ T
                                     T
                                                1 T
                             W ¼ð  G  þ  IÞ   X Z                      ð7:42Þ
                             W
                                  1  N S  K
                            2        X X        ^           2
                                                W
                           ^     ¼           x x n;k kWg ðq Þ  z n k   ð7:43Þ
                                                     k
                                N S N
                                     n¼1 k¼1
            3. Repeat 1 and 2 until no significant changes occur.
                                                        z
                                                     N S
                                                  P
            G is a K   K diagonal matrix with G kk ¼  n¼1 n,k as elements.   is a
            K   M matrix containing the basis functions:   k,m ¼ 
 m (q ). X is a
                                                                   k
            N S   K matrix with the   x n,k as elements. Z is a N S   N matrix. It
                                    x
                                     T
            contains the data vectors z as its rows. Finally,   is a regularization
                                     n
                                                    T
            parameter, which is needed in cases where   G  becomes singular and
            the inverse cannot be computed.
              The GTM can be initialized by performing PCA on the data set,
            projecting the data to D dimensions and finding W such that the result-
            ing latent variables q approximate as well as possible the projected
                                n
                                    2
            data. The noise variance   can be found as the average of the N   D
            smallest eigenvalues, as in probabilistic PCA.
              Training the GTM is usually quick and converges well. However, it
            suffers from the standard problem of EM algorithms in that it may
            converge to a local optimum. Furthermore, its success depends highly
            on the choices for K (the number of grid points), M (the number of basis
            functions for the mapping),   ’ (the width of these basis functions) and
   254   255   256   257   258   259   260   261   262   263   264