Page 173 - Classification Parameter Estimation & State Estimation An Engg Approach Using MATLAB
P. 173
162 SUPERVISED LEARNING
are close to those in Figure 5.6(a), especially in the more important
areas of the measurement space.
The basic PRTools code used to generate Figure 5.6 is given in
Listing 5.5.
Listing 5.5
PRTools code for finding and plotting one-nearest neighbour classifiers
on both an edited and a condensed data set. The function edicon takes
a distance matrix as input. In PRTools, calculating a distance matrix is
implemented as a mapping proxm,so z proxm(z) is the distance
matrix between all samples in z. See Section 7.2.
load nutsbolts; % Load the dataset z
J ¼ edicon(z proxm(z),3,5,[]); % Edit z
w ¼ knnc(z(J,:),1); % Train a 1-NNR
figure; scatterd(z(J,:)); plotc(w);
J ¼ edicon(z proxm(z),3,5,10); % Edit and condense z
w ¼ knnc(z(J,:),1); % Train a 1-NNR
figure; scatterd(z(J,:)); plotc(w);
If a non-edited training set is fed into the condensing algorithm, it may
result in erroneous decision boundaries, especially in areas of the meas-
urement space where the training set is ambiguous.
5.3.3 Linear discriminant functions
Discriminant functions are functions g k (z), k ¼ 1, ... , K that are used
in a decision function as follows:
^ ! !ðzÞ¼ ! n with: n ¼ argmaxfg k ðzÞg ð5:35Þ
k¼1;...;K
Clearly, if g k (z) are the posterior probabilities P(! k jz), the decision
function becomes a Bayes decision function with a uniform cost func-
tion. Since the posterior probabilities are not known, the strategy is to
replace the probabilities with some predefined functions g k (z) whose
parameters should be learned from a labelled training set.
An assumption often made is that the samples in the training set can
be classified correctly with linear decision boundaries. In that case, the
discriminant functions take the form of:
T
g k ðzÞ¼ w z þ w k ð5:36Þ
k