Page 80 - Algorithm Collections for Digital Signal Processing Applications using MATLAB
P. 80

68                                                         Chapter  2

           2.       GAUSSIAN MIXTURE MODEL

              Let us consider the data collected from the certain group of people about
           their age and height are represented in two-dimensional vectors. Let the first
           element of the vector be the age and the second element is the height of the
           corresponding person, as described below. P 1 = [a 1 h 1], P 2 = [a 2 h 2] ... P m  =
           [a m h m]. Now let us define the problem for classifying the collected group of
           vectors into ‘n’ category called as ‘n’ clusters, such that the centroids of the
           clusters are away from each other and the data belonging to the same cluster
           are nearer to each other as shown in the figure. The dots in the Figure 2-1
           belong to the centroid of the individual clusters. The problem  defined is
           called clustering the data, otherwise called as grouping the data.
              Let us  model the probability of the particular vector ‘x’ from the
           collected data as the Linear combination of Multi Variate Gaussian density
           function.
              (ie) p(x) = p(c 1) * Gaussian density function of ‘x’  with mean  vector
           ‘m 1’ and covariance matrix ‘cov 1’ + p(c 2) * Gaussian density function of ‘x’
           with mean vector ‘m 2’ and covariance matrix ‘cov 2’ + … p(c n) * Gaussian
           density function of ‘x’ with mean vector ‘m n’ and covariance matrix ‘cov n’
           where p(c n) is the probability of the clustern n.

              (ie) p(x)  =  p(c 1) p(x/c 1)  +  p(c 2) p(x/c 2) + p(c 3 )  p(x/c 3) +. . . p(c n)
           p(x/c n)

              The model described above is called as Gaussian Mixture Model. Given
           the data (x 1, x 2, x 3, x 4… x m  ) , obtaining the mean vectors m 1, m 2, … and the
           covariance matrices c 1, c 2, … and the probability of clusters p(c 1), p(c 2)…
           such that the probability of the collected data is maximized is the task to be
           performed for modeling. As the collected data are independent to each other,
           the probability of the collected data (p) is represented as the product of p(x 1),
           p(x 2)…p(x m) (ie) P D=p(x 1) p(x 2)…p(x m)
   75   76   77   78   79   80   81   82   83   84   85