Page 346 - Computational Statistics Handbook with MATLAB

P. 346

Chapter 9: Statistical Pattern Recognition 335

Using this type of classifier and this partition of the learning sample, we esti-
mate the probability of correct classification to be 0.74.

Validation
nn
VValidatioalidatio
Cross- Cross-Cross- Cross-V alidatio n
The cross-validation procedure is discussed in detail in Chapter 7. Recall that
with cross-validation, we systematically partition the data into testing sets of
size k. The n - k observations are used to build the classifier, and the remaining
k patterns are used to test it. We continue in this way through the entire data
set. When the sample is too small to partition it into a single testing and train-
ing set, then cross-validation is the recommended approach. The following is
the procedure for calculating the probability of correct classification using
cross-validation with k = 1.

PROBABILITY OF CORRECT CLASSIFICATION - CROSS-VALIDATION

1. Set the number of correctly classified patterns to 0, N CC = . 0
.
2. Keep out one observation, call it x i
3. Build the classifier using the remaining n – 1 observations.
to the classifier and obtain a class label
4. Present the observation x i
using the classifier from the previous step.
5. If the class label is correct, then increment the number correctly
classified using

N CC = N CC + . 1

6. Repeat steps 2 through 5 for each pattern in the sample.
7. The probability of correctly classifying an observation is given by

(
----------
PCC) = N CC .
n
Example 9.7
We return to the iris data of Example 9.6, and we estimate the probability
of correct classification using cross-validation with k = 1. We first set up
some preliminary variables and load the data.
load iris
% This loads up three matrices:
% setosa, versicolor and virginica.
% We will use the versicolor and virginica.
% Note that the priors are equal, so the decision is
© 2002 by Chapman & Hall/CRC

341 342 343 344 345 346 347 348 349 350 351