Page 51 -
P. 51
2.3 The Covariance Matrix 37
operations, as can be illustrated for the preceding example, computing in the
transformed space the distance corresponding to the feature vector [I .5 11' :
Using the Mahalanobis metric with the appropriate covariance matrix we are
able to adjust our classifiers to any particular hyperellipsoidal shape the pattern
clusters might have.
We now present some important properties of the covariance matrix, lo be used
in following chapters.
Covariance estimation
Until now we have only used sample estimates of mean and covariance conlputed
from a training set of n patterns per class. As already discussed in section 1.52, in
order for a classifier to maintain an adequale performance when presented with
new cases, our mean and covariance estimates must be sufficiently near the
theoretical values, corresponding to n -+ m.
Estimating C corresponds to estimating d(d+1)/2 terns c,. Looking at formula
(2-17a), we see that C is the sum of n- 1 independent dxd matrices of characteristic
I, therefore the computed matrix will be singular if n I d. The conclusion is that n
= d+l is the minimum number oj'patterns per class a training set must have in
order for a classifier using the Mahalanobis distance to be designed. Near this
minimum value numerical problems can arise in the computation of C '.
Orthonormal transformation
The orthonormal transformation is a linear transformation which allows one to
derive uncorrelated features from a set of comelated features. In order to see how
this transformation is determined, let us consider the correlated features of Figure
2.1 1 (feature vector y) and assume that we knew the linear transformation y = Ax,
producing y based on the uncorrelated features corresponding to the feature vector
x, characterized by a simple unit covariance matrix I (circular cluster). Suppose
now that we wished to find the uncorrelated feature vectors z that maintain the
same direction after the transformation:
The determination of the scalars /Z and the vectors z corresponds to solving the
equation (21-A)z = 0, with I the unit dxd matrix, i.e. 121-A1 = 0, in order to obtain
non-trivial solutions. There are d scalar solutions /Z called the eigenvulues of A.