Page 191 - Introduction to Statistical Pattern Recognition
P. 191
4 Parametric Classifiers 173
(1) We can adjust the V’s and vo’s so as to minimize E of (4.157).
Since it is difficult to get an explicit mathematical expression for E, the error
should be calculated numerically each time when we adjust the V’s and vo’s.
When X is distributed normally for all classes, some simplification can be
is
achieved, since the h’s are also normally distributed and p (h, I, . . . ,hjL Io;)
given by an explicit mathematical expression. Even for this case, the integra-
tion of an (L - l)-dimensional normal distribution in the first quadrant must be
carried out in a numerical way, using techniques such as the Monte Carlo
method.
(2) Design a linear discriminant function between a pair of classes
according to one of the methods discussed previously for two-class problems.
L . .
(2) discriminant functions are calculated. Then, use them as a piecewise linear
discriminant function without further modification. When each class distribu-
tion is quite different from the others, further modification can result in less
error. However, in many applications, the decrease in error is found to be rela-
tively minor by the further adjustment of V’s and vo’s.
(3) We can assign the desired output y(X) for a piecewise linear
discriminant function and minimize the mean-square error between the desired
and actual outputs in order to find the optimum V’s and vo’s. The desired out-
puts could be fixed or could be adjusted as variables with constraints. Unfor-
tunately, even for piecewise linearly separable data, there is no proof of con-
vergence.
Binary Inputs
In Section 4.1, we showed that for independent binary inputs the Bayes
classifier becomes linear. In this section, we will discuss other properties of
binary inputs.
When we have n binary inputs forming an input vector X, the number of
all possible inputs is 2”, {Xo,. . . ,X21j-I 1 [see Table 4-2 for example]. Then
the components of Xi, sk,(k = 1, . . . ,H), satisfy