Page 183 - Classification Parameter Estimation & State Estimation An Engg Approach Using MATLAB
P. 183
172 SUPERVISED LEARNING
It basically means that the influence of a single object on the description
of the classifier is limited. This upper bound avoids that noisy objects
with a very large weight completely determine the weight vector and
thus the classifier. The parameter C has a large influence on the final
solution, in particular when the classification problem contains over-
lapping class distributions. It should be set carefully. Unfortunately, it is
not clear beforehand what a suitable value for C will be. It depends on
both the data and the type of kernel function which is used. No generally
applicable number can be given. The only option in a practical applica-
tion is to run cross-validation (Section 5.4) to optimize C.
The support vector classifier has many advantages. A unique global
optimum for its parameters can be found using standard optimization
software. Nonlinear boundaries can be used without much extra com-
putational effort. Moreover, its performance is very competitive with
other methods. A drawback is that the problem complexity is not of the
order of the dimension of the samples, but of the order of the number of
samples. For large sample sizes (N S > 1000) general quadratic program-
ming software will often fail and special-purpose optimizers using
problem-specific speedups have to be used to solve the optimization.
A second drawback is that, like the perceptron, the classifier is basic-
ally a two-class classifier. The simplest solution for obtaining a classifier
with more than two classes is to train K classifiers to distinguish one
class from the rest (similar to the place coding mentioned above). The
T
classifier with the highest output w z þ b then determines the class label.
k
Although the solution is simple to implement, and works reasonable
well, it can lead to problems because the output value of the support
vector classifier is only determined by the margin between the classes it is
trained for, and is not optimized to be used for a confidence estimation.
Other methods train K classifiers simultaneously, incorporating the one-
class-against-the-rest labelling directly into the constraints of the optim-
ization. This gives again a quadratic optimization problem, but the
number of constraints increases significantly which complicates the
optimization.
Example 5.6 Classification of mechanical parts, support vector
classifiers
Decision boundaries found by support vector classifiers for the
mechanical parts example are shown in Figure 5.11. These plots were
generated by the code shown in Listing 5.7. In Figure 5.11(a), the
kernel used was a polynomial one with degree d ¼ 2 (a quadratic
kernel); in Figure 5.11(b), it was a Gaussian kernel with a width