Page 247 - Applied Statistics Using SPSS, STATISTICA, MATLAB and R
P. 247

228      6 Statistical Classification


                   N   
              x  =         or   x = [  PRT10 . ’                          6.8
                                          ]N
                  PRT 10 

              In this two-dimensional feature space,  the minimum Euclidian distance
           classifier is implemented as follows (see Figure 6.5):

              1.  Draw the straight line (decision surface) equidistant from the sample means,
                 i.e.,  perpendicular to the segment linking the means and  passing at half
                 distance.
              2.  Any case above the straight line is assigned to ω 2.  Any  sample  below  is
                 assigned to ω 1. The assignment is arbitrary if the case falls on the straight-
                 line boundary.

              Note that using PRT10 instead of PRT in the scatter plot of Figure 6.5 eases the
           comparison  of feature contribution, since  the feature ranges  are practically  the
           same.
              Counting the number of wrongly classified cases, we  notice that the overall
           error falls to 18%. The addition of PRT10 to the classifier seems beneficial.



           6.2.2 Minimum Mahalanobis Distance Discriminant

           In the previous section, we  used the Euclidian  distance  in order to  derive the
           minimum distance, classifier rule. Since the features are random variables, it seems
           a reasonable assumption that the distance of a feature vector to the class prototype
           (class sample mean) should  reflect the multivariate distribution  of the  features.
           Many multivariate distributions have probability functions that depend on the joint
           covariance matrix. This is the case with the  multivariate  normal distribution, as
           described in section A.8.3 (see formula A.53). Let us assume that all classes have
           an identical covariance matrix Σ, reflecting a similar hyperellipsoidal shape of the
           corresponding feature vector distributions. The “surfaces” of equal probability
           density of the feature vectors relative to a sample mean vector m k correspond to a
           constant value of the following squared Mahalanobis distance:

                           ) Σ
              d  2 k  (x ) = (x − m ’  − 1 (x − m  k  ) ,                   6.9
                          k

              When the covariance matrix is the unit matrix, we obtain:

              d  2 k  (x ) = (x − m k  ) I ’  − 1 (x − m  k ) =  (x − m  k  )’ (x − m k  ) ,

           which is the squared Euclidian distance of formula 6.7.
   242   243   244   245   246   247   248   249   250   251   252