Page 161 - Classification Parameter Estimation & State Estimation An Engg Approach Using MATLAB
P. 161
150 SUPERVISED LEARNING
not available or is not used explicitly. The name may suggest that no
parameters are involved. However, in fact these methods often require
more parameters than parametric methods. The difference is that in
nonparametric methods the parameters are not the parameters of the
conditional distributions.
At first sight, nonparametric learning seems to be more difficult than
parametric learning because nonparametric methods exploit less know-
ledge. For some nonparametric methods this is indeed the case. In
principle, these types of nonparametric methods can handle arbitrary
types of conditional distributions. Their generality is high. The downside
of this advantage is that large to very large training sets are needed to
compensate the lacking knowledge about the densities.
Other nonparametric learning methods cannot handle arbitrary types
of conditional distributions. The classifiers being trained are constrained
to some preset computational structure of their decision function. By
this, the corresponding decision boundaries are also constrained. An
example is the linear classifier already mentioned in Section 2.1.2. Here,
the decision boundaries are linear (hyper)planes. The advantage of
incorporating constraints is that fewer samples in the training set are
needed. The stronger the constraints are, the fewer samples are needed.
However, good classifiers can only be obtained if the constraints that are
used match the type of the underlying problem-specific distributions.
Hence, in constraining the computational structure of the classifier
implicit knowledge of the distribution is needed.
5.3.1 Parzen estimation and histogramming
The objective of Parzen estimation and histogramming is to obtain
estimates of the conditional probability densities. This is done without
much prior knowledge of these densities. As before, the estimation is
based on a labelled training set T S . We use the representation according
to (5.3), i.e. we split the training set into K subsets T k , each having N k
samples all belonging to class ! k . The goal is to estimate the conditional
density p(zj! k ) for arbitrary z.
A simple way to reach the goal is to partition the measurement space
into a finite number of disjoint regions R i , called bins, and to count the
number of samples that falls in each of these bins. The estimated prob-
ability density within a bin is proportional to that count. This technique
is called histogramming. Suppose that N k,i is the number of samples