Page 228 - Classification Parameter Estimation & State Estimation An Engg Approach Using MATLAB
P. 228
FEATURE REDUCTION 217
The D N matrix W D transforms the N-dimensional measurement
space to a D-dimensional feature space. Ideally, the transform is such
that y is a good representation of z despite of the lower dimension of y.
This objective is strived for by selecting W D such that an (unbiased)
1
z
linear MMSE estimate ^ z lMMSE for z based on y yields a minimum mean
square error (see Section 3.1.5):
n h io
2
z k
W D ¼ arg min E ^ z lMMSE ðyÞ zk with y ¼ Wz ð7:3Þ
W
It is easy to see that this objective function does not provide a unique
solution for W D . If a minimum is reached for some W D , then any matrix
AW D is another solution with the same minimum (provided that A is
invertible) as the transformation A will be inverted by the linear MMSE
procedure. For uniqueness, we add two requirements. First, we require that
the information carried in the individual elements of y add up individually.
With that we mean that if y is the optimal D dimensional representation of
z, then the optimal D 1 dimensional representation is obtained from y,
simply by deleting its least informative element. Usually, the elements of y
are sorted in decreasing order of importance, so that the least informative
element is always the last element. With this convention, the matrix W D 1
is obtained from W D simply by deleting the last row of W D .
The requirement leads to the conclusion that the elements of y must be
uncorrelated. If not, then the least informative element would still carry
predictive information about the other elements of y which conflicts
with our requirement. Hence, the covariance matrix C y of y must be a
diagonal matrix, say .If C z is the covariance matrix of z, then:
T
C y ¼ W D C z W ¼ D ð7:4Þ
D
T
T
For D ¼ N it follows that C z W ¼ W N because W N is an invertible
N N
T
matrix (in fact, an orthogonal matrix) and W W N must be a diagonal
N
matrix (see Appendix B.5 and C.3.2). As N is a diagonal matrix, the
T
columns of W must be eigenvectors of C z . The diagonal elements of N
N
are the corresponding eigenvalues.
The solution is still not unique because each element of y can be scaled
individually without changing the minimum. Therefore, the second
requirement is that each column of W T has unit length. Since the
N
1
Since z and y are zero mean, the unbiased linear MMSE estimator coincides with the linear
MMSE estimator.