Page 258 - Classification Parameter Estimation & State Estimation An Engg Approach Using MATLAB

P. 258

CLUSTERING 247

2
some Gaussian noise with variance in all directions. The full model
for the probability of observing a vector z is:

1 k fðq; WÞ zk 2 !
2
pðzjq; W; Þ¼ D exp 2 ð7:36Þ
2 2 D
In general, the distribution of z in the high dimensional space can be
found by integration over the latent variable q:

Z
2 2
pðzjW; Þ¼ pðzjq; W; ÞpðqÞdq ð7:37Þ

In order to allow an analytical solution of this integral, a simple grid-like
probability model is chosen for p(q), just like in the SOM:

K
1 X
pðqÞ¼ ðq q Þ ð7:38Þ
k
K
k¼1
i.e. a set of Dirac functions centred on grid nodes q . The log-likelihood
k
of the complete model can then be written as:

!
N S 1 K
2 X X 2
ln LðW; Þ¼ ln pðz n jq ; W; Þ ð7:39Þ
k
K
n¼1 k¼1
Still, the functional form for the mapping function f(q; W) has to be
defined. This function maps the low dimensional grid to a manifold in
the high dimensional space. Therefore, its form controls how nonlinear
the manifold can become. In the GTM, a regression on a set of fixed
basis functions is used:

fðq; WÞ¼ Wg ðqÞ ð7:40Þ

g (q) is a vector containing the output of M basis functions, which are
usually chosen to be Gaussian with means on the grid points and a fixed
width ’ . W is a N M weight matrix.
Given settings for K, M and ’ , the EM algorithm can be used to
T
T
2
estimate W and . Let the complete data be y ¼ [z T x ], with x n the
n n n
hidden variables. x n is a K-dimensional vector. The element x n,k codes

253 254 255 256 257 258 259 260 261 262 263