Page 126 -
P. 126
4.3 Model-Free Techniques 113
An implementation of the Parzen window method developed by Specht (1990)
is worth mentioning, since it constitutes an efficient way of obtaining estimates of
the conditional probabilities of feature vector distributions and, therefore, lets us
proceed to their classification, once we have a representative training set available.
Assuming a Gaussian kernel function let us rewite equation(4-36) as:
The summation terms can be computed, except for a scaling factor, in the
following way:
1. Normalize x and xi to unit length: x/llx((, xi lIlxill;
2. Compute the normalized dot products z, = x ' x, / (IIxII.ll X, 11);
3. Apply the kernel function exp((zi -1)ld).
Summing all the exponential terms results in conditional probabilities estimates
to which we can then apply the decision device illustrated in Figure 4.16.
The computational flow-diagram of this method resembles the connectionist
structure of a neural network, hence the name of probabilistic neural network
given to this method. However, it lacks any non-trivial learning capability, which,
as we will see later, is a key aspect of neural nets. Statistics makes this method
available in its Neural Networks tool. For the two-class cork stoppers data with
features N and PRTIO, a training set error of 5% is achieved with this method,
which is very good compared to the previous result. Notice, however, that this is a
training set estimate, which is usually optimistic. For a more accurate error
estimate, the classifier would have to be evaluated using one of the methods that
will be described in section 4.5.
It is also worth mentioning that Sttatistical Learning Theory teaches us that pdf
estimation is a more difficult type of problem than pattern classification. Therefore,
when using the Parzen window method for pattern classification, we are violating a
commonsense principle: do not attempt to solve a specified problem by indirectly
solving a harder problem as an intermediate step (see e.g. Cherkassky and Mulier,
1998).
4.3.2 The K-Nearest Neighbours Method
This method of pdfestimation is based on fixing the number of points k(n) that
exist in a certain region centred on a feature vector x. This is done by growing a
region around x, with a suitable metric, until k points are captured. These are the
k(n) nearest neighbours of x. The region then has a volume V(n) and the pdf
estimate is given by: