Page 40 -
P. 40

26     2 Pattern Discrimination

          In  the  original  feature  space  we  are dealing  with  2  features  and  will  have  to
        compute  (adjust)  15 weights. We may choose instead  to operate in  a transformed
       feature  space, the  space of  the  monomial  functions  X~X; ( xf , xi,  x:x~,  X:  ,
        etc.).  We will  then  have  to  compute  in  a straightforward way  14 features plus  a
        bias  term  (w0) The  eventual  benefit  derived  from  working  in  this  higher
        dimensionality  space  is  that  it  may  be  possible  to  determine  a  linear  decision
        function  involving easier computations and  weight adjustments (see Exercises 2.3
        and  2.4).  However,  working  in  high  dimensional  spaces  raises  a  number  of
        difficulties as explained later.
          For a PR problem originally  stated in a d features space, it can  be shown that in
        order to determine a polynomial decision function of degree k one has to compute:

                      (d+ k)!
           C(d + k, k) = --- polynomial coefficients,
                       d  !k !

        with C(n, k)  denoting the number of combinations of n elements taken k at a time.
          For  d=2  and  k=4  expression  (2-7)  yields  15  polynomial  coefficients.  For
        quadratic decision functions one would need to compute:

                     (d+2)! - (d+2)(d+l)
           C(d + 2,2) = ------- -        polynomial coefficients.
                      d!2!        2
          Linear  decision  functions are quite  popular, as they  are easier to  compute and
        have simpler mathematics. This is why  in  the following we lend more attention to
        linear separability.


        2.1.2 Hyperplane Separability

        Let  us  consider again the situation  depicted in  Figure 2.1. The decision  "surface"
        divides  the  feature  space  into  two  decision  regions:  the  half  plane  above  the
        straight  line  containing  class  uI, and  the  half  plane  below  the  straight  line
        containing class u2.
          In  a multiple class problem, with several decision surfaces, arbitrarily  complex
        decision regions can be expected and the separation of the classes can  be achieved
        in  essentially  two ways:  each class can be separated from all  the others (ubsolute
        separation); the classes can only be separated into pairs  (pairwise  separatiotz). In
        the  following we analyse these two types of  class discrimination assuming  linear
        separability by hyperplanes.

        Absolute separation

        Absolute separation corresponds strictly to definition (2-3): any class is separable
        from all the others. Figure 2.6 illustrates the absolute separation for d=2 and c=3.
   35   36   37   38   39   40   41   42   43   44   45