Page 124 -
P. 124

4.3 Model-Free Techniques   1 11














                             Figure 4.27. Parzen  window estimation of  a one-dimensional  distribution,  using a
                             rectangular window.



                               The  Parzen  window  method  is  a  generalization  of  this  formula,  such  that,
                             instead  of  a  hard-limiting  hypercube  window  p,  we  use  any  smooth  window
                             satisfying simple conditions of  positiveness  and unitary  volume. The multivariate
                             normal function is a frequently used  window function. The role of this smoothing
                             window,  known  as  inrerpolating  kernel,  is  to  weight  the  contribution  of  each
                             training set vector xi to the pdf  estimate at x, in accordance with its deviation from
                             x. As  for the smoothing parameter h(n),  which must vary  inversely with  n, Duda
                             and Hart (1973) suggest  1 / & . This can be taken  as a first guess to be adjusted
                             experimentally. Concerning the choice of  kernel  function  p, the normal  function
                             with  estimated  covariance  is  in  fact  the  optimal  choice  for  a  large  family  of
                             symmetric distributions that include the normal pdf  itself (see Fukunaga, 1990).
                               Let us consider the  LogNorm  dataset in  the Parzen.xls file (see Appendix  A),
                             which has a lognormal distribution characterized by a left asymmetry. Figure 4.28
                             illustrates the influence of  the number of  points and the smoothing factor h(n) on
                             the density estimate.  Notice also the difficulty  in  obtaining a good  adjustment of
                             the peaked part  of  the distribution,  requiring a high  number of  training samples.
                             Even  for  smoother distributions,  such  as  the  normal  one,  we  may  need  a  large
                             training  set  size  (see Duda  and  Hart,  1973) in  order  to  obtain  an  estimate  that
                              follows the true distribution closely. The problem is even worse when there is more
                              than  one  dimension,  due  to  the  curse  of  dimensionality  already  referred  to
                              previously.  The  large  datasets  necessary  for  accurate pdf  estimation  may  be  a
                              difficulty in the application of the method.
                               Let us now see how the Parzen window method can be applied to real data. For
                              this purpose let us use the cork stoppers data (two classes) and feature ARM. This
                              feature has an unknown  asymmetrical distribution  with a clear deviation from the
                              normal  distribution.  The  Parzen  window  method  seems,  therefore,  a  sensible
                              alternative  to  the  Bayesian  approach.  Figure  4.29  shows  the  Parzen  window
                              estimates of  the  distributions,  also  included  in  the  Parzen.xls  file.  From  these
                              distributions it is possible  to select a threshold  for class discrimination. Choosing
                              the  threshold  corresponding  to  the  distribution  intersection  point,  and  assuming
                              equal prevalences,  the  overall  training  set  error  is  24%.  This  is  in  fact  a  very
                              reasonable error rate  compared to the 23%  obtained  in  4.1.1  for the much  more
                              discriminating feature N.
   119   120   121   122   123   124   125   126   127   128   129