Page 188 - Introduction to Statistical Pattern Recognition
P. 188

170                        Introduction to Statistical Pattern Recognition










                                                                                 (4.149)


                       where  Mij  and Zjj are the  expected  vector  and  covariance  matrix  of  the jth
                      cluster in mi. Or, defining the distances as

                               1                    1
                        d*.(x) = -(x-M..)~c:'  (x-M..) + - In  I C,  I  -In  P; -In  pij ,   (4.150)
                               2     'J   IJ        2
                       the classifier becomes


                                                                                 (4.151)



                       Note  that  the  decision of  (4.151) is  different  from  min da(X),  which  is  the
                       Bayes decision if  we  treat this problem as an (rn I+m2)-class problem.  Also, it
                       should be realized that the distances are adjusted by lnP,  as well as lnPi.

                            Piecewise linear classifiers:  When all covariance matrices are the same
                       in multiclass problems, XTXT1X and In  I Ci I  of (4.148) are common among all
                       classes, and (4.148) is reduced to





                       where  C is  the common covariance matrix, and  min  of  (4.148) is changed to
                       max  in  (4.152) because of  the change of  sign.  That  is, X is classified to the
                       class with the highest correlation between X and Z-'Mj.  Again, the correlation
                       must be adjusted by  constant terms.
                            When  covariance matrices are different among classes but close to each
                       other, we may replace  of  (4.152) by  the averaged covariance.
                            Another alternative, particularly when covariance matrices  are not  close
                       to  each other, is to  set a linear discriminant function for each pair  of  classes,
                       and to optimize the coefficients.  Let each discriminant function be
   183   184   185   186   187   188   189   190   191   192   193