Page 246 - Classification Parameter Estimation & State Estimation An Engg Approach Using MATLAB
P. 246

CLUSTERING                                                   235

              One method for fitting the parameters of (7.13) to the data is to use
            maximum likelihood estimation (see Section 3.1.4), i.e. to optimize the
            log-likelihood:


                                                       N S
                               def                    Y
                         ð                    jCÞ¼ ln    p z n jCÞ     ð7:14Þ
                                                           ð
                        LZjCÞ¼ ln pðz 1 ; .. . ; z N S
                                                      n¼1
                               g. The optimization of L(ZjC) w.r.t. C is compli-
            with Z ¼fz 1 , ... , z N S
            cated and cannot be done directly with this equation. Fortunately, there
            exists a standard ‘hill-climbing’ algorithm to optimize the parameters in
            this equation. The expectation-maximization (EM) algorithm is a gen-
            eral procedure which assumes the following problem definition. We
            have two data spaces: the observed data Z and the so-called missing
                                             g. Let Y be the complete data in
            data (hidden data) X ¼fx 1 , .. . , x N S
            which each vector y consists of a known part z n and a missing part x n .
                              n
                                                       T
                                             T
            Thus, the vectors y are defined by y ¼ [ z T n  x ]. Only z n is available
                             n
                                             n
                                                       n
            in the training set, x n is missing, and as such unknown. The EM algo-
            rithm uses this model of ‘incomplete data’ to iteratively estimate the
            parameters C of the distribution of z n . It tries to maximize the log-
            likelihood L(ZjC). In fact, if the result of the i-th iteration is denoted
                (i)
                                                                      (i)
            by C , then the EM algorithm assures that L(ZjC (iþ1) )   L(ZjC ).
              The mathematics behind the EM algorithm is rather technical and will
            be omitted. However, the intuitive idea behind the EM algorithm is as
            follows. The basic assumption is that with all data available, that is if we
            had observed the complete data Y, the maximization of L(YjC) would
            be simple, and the problem could be solved easily. Unfortunately, only Z
            is available and X is missing. Therefore, we maximize the expectation
            E[L(YjC)jZ] instead of L(YjC). The expectation is taken over the com-
            plete data Y, but under the condition of the observed data Z. The
            estimate from the i-th iteration follows from:

                                      Z
                                                           ðiÞ
                        E½LðYjCÞjZм    ðln pðYjCÞÞpðYjZ; C ÞdY
                                                                       ð7:15Þ
                              C ðiþ1Þ  ¼ arg maxfE½LðYjCÞjZŠg
                                         C
            The integral extends over the entire Y space. The first step is the E step;
            the last one is the M step.
              In the application of EM optimization to a mixture of Gaussians (with
            predefined K; see also the discussion on page 228) the missing part x n ,
            associated with z n ,is a K dimensional vector. x n indicates which one of
   241   242   243   244   245   246   247   248   249   250   251