Page 126 -
P. 126

4.3 Model-Free Techniques   113

                             An  implementation of  the Parzen window method developed by  Specht (1990)
                           is worth mentioning, since it constitutes an efficient way of obtaining estimates of
                           the conditional probabilities of  feature vector distributions and, therefore, lets us
                           proceed to their classification, once we have a representative training set available.
                           Assuming a Gaussian kernel function let us rewite equation(4-36) as:






                             The  summation  terms  can  be  computed,  except  for  a  scaling  factor,  in  the
                           following way:

                            1. Normalize x and xi to unit length: x/llx((, xi lIlxill;
                           2. Compute the normalized dot products z, = x ' x, / (IIxII.ll  X, 11);
                           3. Apply the kernel function exp((zi -1)ld).

                              Summing all the exponential terms results in conditional probabilities estimates
                           to which we can then apply the decision device illustrated in Figure 4.16.
                              The  computational  flow-diagram  of  this  method  resembles  the  connectionist
                            structure of  a  neural  network,  hence  the  name  of  probabilistic  neural  network
                            given to this method. However, it lacks any non-trivial learning capability, which,
                            as we  will  see later, is a key  aspect of  neural  nets.  Statistics makes this method
                            available in  its  Neural  Networks  tool. For the two-class cork  stoppers data with
                            features N  and  PRTIO, a training set error of  5% is achieved  with  this method,
                            which is very good compared to the previous result. Notice, however, that this is a
                            training  set  estimate,  which  is  usually  optimistic.  For  a  more  accurate  error
                            estimate, the classifier would  have to be evaluated using one of the methods that
                            will be described in section 4.5.
                              It is also worth  mentioning that Sttatistical Learning Theory teaches us that pdf
                            estimation is a more difficult type of problem than pattern classification. Therefore,
                            when  using the Parzen window method for pattern classification, we are violating a
                            commonsense principle: do not attempt to solve a specified problem by  indirectly
                            solving a harder problem as an intermediate step (see e.g. Cherkassky and Mulier,
                            1998).


                            4.3.2  The K-Nearest Neighbours Method

                            This method  of pdfestimation  is based  on  fixing  the number of  points k(n) that
                            exist in  a certain region centred on a feature vector x. This is done by  growing a
                            region  around x, with  a suitable metric, until  k points are captured. These are the
                             k(n) nearest  neighbours  of  x.  The  region  then  has  a  volume  V(n)  and  the pdf
                             estimate is given by:
   121   122   123   124   125   126   127   128   129   130   131