Page 258 - Classification Parameter Estimation & State Estimation An Engg Approach Using MATLAB
P. 258

CLUSTERING                                                   247

                                              2
            some Gaussian noise with variance   in all directions. The full model
            for the probability of observing a vector z is:


                                     1         k fðq; WÞ  zk 2  !
                              2
                    pðzjq; W;  Þ¼    D   exp          2                ð7:36Þ
                                  2  2   D
            In general, the distribution of z in the high dimensional space can be
            found by integration over the latent variable q:

                                        Z
                                    2               2
                            pðzjW;  Þ¼     pðzjq; W;  ÞpðqÞdq          ð7:37Þ

            In order to allow an analytical solution of this integral, a simple grid-like
            probability model is chosen for p(q), just like in the SOM:


                                           K
                                         1  X
                                  pðqÞ¼        ðq   q Þ                ð7:38Þ
                                                    k
                                         K
                                           k¼1
            i.e. a set of Dirac functions centred on grid nodes q . The log-likelihood
                                                          k
            of the complete model can then be written as:

                                                              !
                                      N S   1  K
                                 2   X        X              2
                        ln LðW;  Þ¼      ln      pðz n jq ; W;  Þ      ð7:39Þ
                                                       k
                                            K
                                     n¼1      k¼1
            Still, the functional form for the mapping function f(q; W) has to be
            defined. This function maps the low dimensional grid to a manifold in
            the high dimensional space. Therefore, its form controls how nonlinear
            the manifold can become. In the GTM, a regression on a set of fixed
            basis functions is used:


                                    fðq; WÞ¼ Wg ðqÞ                    ð7:40Þ

            g (q) is a vector containing the output of M basis functions, which are
            usually chosen to be Gaussian with means on the grid points and a fixed
            width   ’ . W is a N   M weight matrix.
              Given settings for K, M and   ’ , the EM algorithm can be used to
                                                               T
                                                     T
                            2
            estimate W and   . Let the complete data be y ¼ [z T  x ], with x n the
                                                     n     n   n
            hidden variables. x n is a K-dimensional vector. The element x n,k codes
   253   254   255   256   257   258   259   260   261   262   263