Page 246 - Classification Parameter Estimation & State Estimation An Engg Approach Using MATLAB
P. 246
CLUSTERING 235
One method for fitting the parameters of (7.13) to the data is to use
maximum likelihood estimation (see Section 3.1.4), i.e. to optimize the
log-likelihood:
N S
def Y
ð jCÞ¼ ln p z n jCÞ ð7:14Þ
ð
LZjCÞ¼ ln pðz 1 ; .. . ; z N S
n¼1
g. The optimization of L(ZjC) w.r.t. C is compli-
with Z ¼fz 1 , ... , z N S
cated and cannot be done directly with this equation. Fortunately, there
exists a standard ‘hill-climbing’ algorithm to optimize the parameters in
this equation. The expectation-maximization (EM) algorithm is a gen-
eral procedure which assumes the following problem definition. We
have two data spaces: the observed data Z and the so-called missing
g. Let Y be the complete data in
data (hidden data) X ¼fx 1 , .. . , x N S
which each vector y consists of a known part z n and a missing part x n .
n
T
T
Thus, the vectors y are defined by y ¼ [ z T n x ]. Only z n is available
n
n
n
in the training set, x n is missing, and as such unknown. The EM algo-
rithm uses this model of ‘incomplete data’ to iteratively estimate the
parameters C of the distribution of z n . It tries to maximize the log-
likelihood L(ZjC). In fact, if the result of the i-th iteration is denoted
(i)
(i)
by C , then the EM algorithm assures that L(ZjC (iþ1) ) L(ZjC ).
The mathematics behind the EM algorithm is rather technical and will
be omitted. However, the intuitive idea behind the EM algorithm is as
follows. The basic assumption is that with all data available, that is if we
had observed the complete data Y, the maximization of L(YjC) would
be simple, and the problem could be solved easily. Unfortunately, only Z
is available and X is missing. Therefore, we maximize the expectation
E[L(YjC)jZ] instead of L(YjC). The expectation is taken over the com-
plete data Y, but under the condition of the observed data Z. The
estimate from the i-th iteration follows from:
Z
ðiÞ
E½LðYjCÞjZ¼ ðln pðYjCÞÞpðYjZ; C ÞdY
ð7:15Þ
C ðiþ1Þ ¼ arg maxfE½LðYjCÞjZg
C
The integral extends over the entire Y space. The first step is the E step;
the last one is the M step.
In the application of EM optimization to a mixture of Gaussians (with
predefined K; see also the discussion on page 228) the missing part x n ,
associated with z n ,is a K dimensional vector. x n indicates which one of