Page 247 - Applied Statistics Using SPSS, STATISTICA, MATLAB and R
P. 247
228 6 Statistical Classification
N
x = or x = [ PRT10 . ’ 6.8
]N
PRT 10
In this two-dimensional feature space, the minimum Euclidian distance
classifier is implemented as follows (see Figure 6.5):
1. Draw the straight line (decision surface) equidistant from the sample means,
i.e., perpendicular to the segment linking the means and passing at half
distance.
2. Any case above the straight line is assigned to ω 2. Any sample below is
assigned to ω 1. The assignment is arbitrary if the case falls on the straight-
line boundary.
Note that using PRT10 instead of PRT in the scatter plot of Figure 6.5 eases the
comparison of feature contribution, since the feature ranges are practically the
same.
Counting the number of wrongly classified cases, we notice that the overall
error falls to 18%. The addition of PRT10 to the classifier seems beneficial.
6.2.2 Minimum Mahalanobis Distance Discriminant
In the previous section, we used the Euclidian distance in order to derive the
minimum distance, classifier rule. Since the features are random variables, it seems
a reasonable assumption that the distance of a feature vector to the class prototype
(class sample mean) should reflect the multivariate distribution of the features.
Many multivariate distributions have probability functions that depend on the joint
covariance matrix. This is the case with the multivariate normal distribution, as
described in section A.8.3 (see formula A.53). Let us assume that all classes have
an identical covariance matrix Σ, reflecting a similar hyperellipsoidal shape of the
corresponding feature vector distributions. The “surfaces” of equal probability
density of the feature vectors relative to a sample mean vector m k correspond to a
constant value of the following squared Mahalanobis distance:
) Σ
d 2 k (x ) = (x − m ’ − 1 (x − m k ) , 6.9
k
When the covariance matrix is the unit matrix, we obtain:
d 2 k (x ) = (x − m k ) I ’ − 1 (x − m k ) = (x − m k )’ (x − m k ) ,
which is the squared Euclidian distance of formula 6.7.