Page 40 -
P. 40
26 2 Pattern Discrimination
In the original feature space we are dealing with 2 features and will have to
compute (adjust) 15 weights. We may choose instead to operate in a transformed
feature space, the space of the monomial functions X~X; ( xf , xi, x:x~, X: ,
etc.). We will then have to compute in a straightforward way 14 features plus a
bias term (w0) The eventual benefit derived from working in this higher
dimensionality space is that it may be possible to determine a linear decision
function involving easier computations and weight adjustments (see Exercises 2.3
and 2.4). However, working in high dimensional spaces raises a number of
difficulties as explained later.
For a PR problem originally stated in a d features space, it can be shown that in
order to determine a polynomial decision function of degree k one has to compute:
(d+ k)!
C(d + k, k) = --- polynomial coefficients,
d !k !
with C(n, k) denoting the number of combinations of n elements taken k at a time.
For d=2 and k=4 expression (2-7) yields 15 polynomial coefficients. For
quadratic decision functions one would need to compute:
(d+2)! - (d+2)(d+l)
C(d + 2,2) = ------- - polynomial coefficients.
d!2! 2
Linear decision functions are quite popular, as they are easier to compute and
have simpler mathematics. This is why in the following we lend more attention to
linear separability.
2.1.2 Hyperplane Separability
Let us consider again the situation depicted in Figure 2.1. The decision "surface"
divides the feature space into two decision regions: the half plane above the
straight line containing class uI, and the half plane below the straight line
containing class u2.
In a multiple class problem, with several decision surfaces, arbitrarily complex
decision regions can be expected and the separation of the classes can be achieved
in essentially two ways: each class can be separated from all the others (ubsolute
separation); the classes can only be separated into pairs (pairwise separatiotz). In
the following we analyse these two types of class discrimination assuming linear
separability by hyperplanes.
Absolute separation
Absolute separation corresponds strictly to definition (2-3): any class is separable
from all the others. Figure 2.6 illustrates the absolute separation for d=2 and c=3.