Page 328 -
P. 328
302 Chapter 8 ■ Classification
When two (or more) values are involved, this calculation can include
combinations of the variables. In the case of P and Q:
n
(P i − µ i )(Q i − µ i )
i = 1
COV = (EQ 8.7)
n − 1
So, covariance is a generalization of variance for multiple variables. The
Mahanalobis distance is much more computationally expensive than the other
distance measures, but it does have the important advantage of being scale
independent, so is often used. However, for simplicity many people use
Euclidean distance, too, and without loss of generality most of the rest of
the examples will use Euclidean distance. Any distance measure may be
substituted, of course.
8.2.2 Distances Between Features
Many pattern recognition tasks use a large number of features to distinguish
between many classes. The Iris data set has four features, which is too many
to visualize in a straightforward way, to characterize three classes. This data
set will be used to illustrate distance-based classifiers, starting with the nearest
neighbor classifier.
Given N classes C 1 , C 2 , ... , C N and M features F 1 .. F M ,consider the clas-
sification of an object, P. Measure all features for this object and create an
M-dimensional vector, v, from them. Feature vectors for all objects in all N
1
classes have also been created; the first such in class C 1 will be C 1 ,the eighth
8
one in class 3 will be C 3 , and so on. Classification of P by the nearest neighbor
method involves calculating the distances between v and all feature vectors
for all the classes. The class of the feature vector having the minimum distance
from v will be assigned to v.
The name of the method is very descriptive. The class of an unknown target
will be the same as that of its nearest neighbor in feature space. Let’s see how
this works using the Iris data set. First, the set needs to be broken into training
data and test data: select the first half of the data for each class to be training
data, and the last half as test data.
Next, feature vectors are created from the training data items. There are
four features, so each vector has four components. This vector is compared
against (i.e., the distance is computed to) all the training data vectors, and the
class of the one with smallest distance is saved: this will be the class given to
the target. This is done for each of the test data items, and success rates are
computed; the raw success rate, the number of correct classifications divided
by the number of test data items, is a good indicator of how good the features
are and of how well the classifier will work overall.

