Page 111 -
P. 111

98     4 Statistical Classification


                             respectively. Figure 4.18 illustrates the normal distribution for the two-dimensional
                             situation.
                               Given  a  training  set  with  n  patterns  T=(x,, x2,  ..,  x,}  characterized  by  a
                             distribution with pdfp(q8  ), where 8 is a parameter vector of the distribution  (e.g.
                             the mean vector of a normal distribution),  an interesting  way of obtaining sample
                             estimates  of  the  parameter  vector  8 is  to  maximize p(q8 ), which  viewed  as a
                             function of 8 is called the likelihood of  8 for  the given training set. Assuming that
                             each pattern is drawn independently from a potentially infinite population, we can
                             express this likelihood as:





                                When using the maximum likelihood estimation of distribution  parameters it is
                             often  easier  to  compute  the  maximum  of  ln[p(qB )],  which  is  equivalent  (the
                             logarithm  is  a  monotonic  increasing  function).  For  Gaussian  distributions,  the
                             sample estimates given  by  formulas (4-21a) and (4-21b) are maximum likelihood
                             estimates and will converge to the true values with an increasing number of cases.
                             The reader  can  find  a  detailed  explanation  of  the  parameter  estimation  issue in
                             Duda and Hart (1973).
                                As can be seen from (4-21), the surfaces of equal probability density for normal
                             likelihood satisfy the Mahalanobis metric already discussed in sections 2.2, 2.3 and
                             4.1.3.
















                             Figure 4.18.  The bell-shaped  surface of  a  two-dimensional  normal  distribution.
                             An ellipsis with equal probability density points is also shown.


                                Let  us  now  proceed  to  compute the  decision  function  (4-l8d)  for  normally
                             distributed features:




                                We may apply a monotonic logarithmic transformation (see 2.1. I), obtaining:
   106   107   108   109   110   111   112   113   114   115   116