Page 310 - Computational Statistics Handbook with MATLAB

P. 310

Chapter 8: Probability Density Estimation 299

2. Determine an initial guess at the component parameters. These are
the mixing coefficients, means and covariance matrices for each
normal density.
3. For each data point x j , calculate the posterior probability using
Equation 8.34.
4. Update the mixing coefficients, the means and the covariance ma-
trices for the individual components using Equations 8.36 through
8.38.
5. Repeat steps 3 through 4 until the estimates converge.

Typically, step 5 is implemented by continuing the iteration until the changes
in the estimates at each iteration are less than some pre-set tolerance. Note
that with the iterative EM algorithm, we need to use the entire data set to
simultaneously update the parameter estimates. This imposes a high compu-
tational load when dealing with massive data sets.

Example 8.11
In this example, we provide the MATLAB code that implements the multi-
variate EM algorithm for estimating the parameters of a finite mixture prob-
ability density model. To illustrate this, we will generate a data set that is a
mixture of two terms with equal mixing coefficients. One term is centered at
,
the point –( 2 2) and the other is centered at 20,( ) . The covariance of each
component density is given by the identity matrix. Our first step is to gener-
ate 200 data points from this distribution.

% Create some artificial two-term mixture data.
n = 200;
data = zeros(n,2);
% Now generate 200 random variables. First find
% the number that come from each component.
r = rand(1,n);
% Find the number generated from component 1.
ind = length(find(r <= 0.5));
% Create some mixture data. Note that the
% component densities are multivariate normals.
% Generate the first term.
data(1:ind,1) = randn(ind,1) - 2;
data(1:ind,2) = randn(ind,1) + 2;
% Generate the second term.
data(ind+1:n,1) = randn(n-ind,1) + 2;
data(ind+1:n,2) = randn(n-ind,1);
We must then specify various parameters for the EM algorithm, such as the
number of terms.
c = 2; % number of terms

305 306 307 308 309 310 311 312 313 314 315