Page 51 -
P. 51

2.3 The Covariance Matrix   37


   operations,  as  can  be  illustrated  for  the  preceding  example,  computing  in  the
   transformed space the distance corresponding to the feature vector  [I  .5  11'  :










     Using  the  Mahalanobis  metric  with  the  appropriate covariance  matrix  we  are
   able  to  adjust our classifiers  to  any  particular  hyperellipsoidal  shape the  pattern
   clusters might have.
     We now present some important properties of the covariance matrix, lo be used
   in following chapters.

   Covariance estimation

   Until now  we have only used  sample estimates of  mean  and covariance conlputed
   from a training set of n patterns per class. As already discussed in section  1.52, in
   order  for  a  classifier  to  maintain  an  adequale performance  when  presented  with
   new  cases,  our  mean  and  covariance  estimates  must  be  sufficiently  near  the
   theoretical values, corresponding to n -+ m.
     Estimating  C corresponds to estimating  d(d+1)/2 terns c,.  Looking at formula
   (2-17a), we see that C is the sum of n- 1  independent dxd matrices of characteristic
   I, therefore the computed  matrix will be singular if n I d. The conclusion  is that n
   = d+l  is  the  minimum  number  oj'patterns per  class  a  training  set  must  have  in
   order  for  a  classifier  using  the  Mahalanobis  distance  to  be  designed.  Near  this
   minimum value numerical problems can arise in the computation of C '.

   Orthonormal transformation

   The  orthonormal  transformation  is  a  linear  transformation  which  allows  one  to
   derive uncorrelated  features from a set of  comelated  features. In order to see how
   this  transformation  is determined, let us  consider the correlated  features of Figure
   2.1  1 (feature vector y) and assume that we knew the linear transformation  y = Ax,
   producing y based on the uncorrelated features corresponding to the feature vector
   x, characterized  by  a  simple  unit  covariance matrix  I  (circular  cluster). Suppose
   now  that we  wished  to  find  the  uncorrelated  feature vectors  z  that  maintain  the
   same direction after the transformation:




     The determination of  the scalars /Z  and  the vectors z corresponds to solving the
   equation (21-A)z = 0, with  I the unit dxd matrix, i.e. 121-A1 = 0, in order to obtain
   non-trivial  solutions. There are d  scalar  solutions  /Z  called  the  eigenvulues of  A.
   46   47   48   49   50   51   52   53   54   55   56