Page 199 - Classification Parameter Estimation & State Estimation An Engg Approach Using MATLAB
P. 199
188 FEATURE EXTRACTION AND SELECTION
An alternative way to represent these distances is by means of scatter
matrices. A scatter matrix gives some information about the dispersion
of a population of samples around their mean. For instance, the matrix
that describes the scattering of vectors from class ! k is:
N k
1 X T
m
m
S k ¼ ðz k;n ^ m Þðz k;n ^ m Þ ð6:5Þ
k
k
N k
n¼1
Comparison with equation (5.14) shows that S k is close to an unbiased
estimate of the class-dependent covariance matrix. In fact, S k is the
maximum likelihood estimate of C k . With that, S k does not only supply
information about the average distance of the scattering, it also supplies
information about the eccentricity and orientation of this scattering.
This is analogous to the properties of a covariance matrix.
Averaged over all classes the scatter matrix describing the noise is:
K
K
N k
1 X 1 X X T
m
m
S w ¼ N k S k ¼ ðz k;n ^ m Þðz k;n ^ m Þ ð6:6Þ
k
k
N S N S
k¼1 k¼1 n¼1
This matrix is the within-scatter matrix as it describes the average
scattering within classes. Complementary to this is the between-scatter
matrix S b that describes the scattering of the class-dependent sample
means around the overall average:
K
1 X T
m
m
m
m
S b ¼ N k ð^ m ^ mÞð^ m ^ mÞ ð6:7Þ
k
k
N S
k¼1
Figure 6.2 illustrates the concepts of within-scatter matrices and
between-scatter matrices. The figure shows a scatter diagram of a train-
ing set consisting of four classes. A scatter matrix S corresponds to an
1 T
ellipse, zS z ¼ 1, that can be thought of as a contour roughly sur-
rounding the associated population of samples. Of course, strictly speak-
ing the correspondence holds true only if the underlying probability
density is Gaussian-like. But even if the densities are not Gaussian, the
ellipses give an impression of how the population is scattered. In the
scatter diagram in Figure 6.2 the within-scatter S w is represented by four
similar ellipses positioned at the four conditional sample means. The
between-scatter S b is depicted by an ellipse centred at the mixture sample
mean.