Page 49 -
P. 49

2.3 The Covariance Matrix   3 5


    transformation  of  x  with a diagonal matrix, amounting to a multiplication  of  each
    feature x,  by  some quantity  a,, would  now  be  scaled  by  a  new  variance  a:s:,
    therefore  preserving  Ilyllv  However,  this  simple  scaling  method  would  fail  to
    preserve  distances for the general linear transformation, such as the one illustrated
    in Figure 2.1 1.
      In  order  to  have  a  distance  measure  that  will  be  ~nvariant to  linear
    transformations  we need  to first consider the notion  of covariance, an extension of
    the more popular variance notion, measuring  the tendency of two features x, and x,
    varying  in  the  same  direction.  The  covariance  between  features  x,  and  xi  is
    estimated as follows for n patterns:






      Notice  that  covariances  are symmetric, c,  = c,,,  and that  c,, is  in  fact the usual
    estimation of the variance of x,.
      The covariance is related  to the well-known Pearson correlation, estimated as:








      Therefore, the correlation can be interpreted as a standardized covariance.
      Looking  at  Figure  2.9,  one  may  rightly  guess  that  circular  clusters  have  no
    privileged direction of variance, i.e., they have equal variance  along any direction.
    Consider  now  the  products  v4 = (xk,, - mi)(xk,  - mj) .  For  any  feature  vector
    yielding a given vi, value, it is a simple matter for a sufficiently  large population to
    find  another, orthogonal,  feature  vector  yielding  - vii  The  v,  products  therefore
    cancel out  (the variation  along one direction  is  uncorrelated with  the variation  in
    any other direction), resulting in  a covariance that  apart from a scale factor is the
     unit matrix, C=I.
       Let  us now  turn  to the elliptic clusters shown  in  Figure 2.1 1. For such ellipses,
     with  the  major  axis  subtending  a  positive  angle  measured  in  an  anti-clockwise
     direction from the abscissas, one will find more and higher positive vii values along
     directions around  the major  axis than  negative  vii values along directions  around
     the  minor axis, therefore resulting  in  a positive  cross covariance  c12  = c2, . If  the
     major  axis  subtends  a  negative angle the  covariance  is  negative.  The higher  the
     covariance, the "thinner" the ellipsis (feature vectors concentrated around the major
     axis). In  the cork  stoppers example of  Figure 2.1 3, the correlation  (and therefore
     also the covariance) between N and PRTlO is high: 0.94.
       Given  a  set  of  n  patterns  we  can  compute  all  the  covariances  using  formula
     (2-15), and then establish a symmetric covariance matrix:
   44   45   46   47   48   49   50   51   52   53   54