Page 346 - Computational Statistics Handbook with MATLAB
P. 346

Chapter 9: Statistical Pattern Recognition                      335


                             Using this type of classifier and this partition of the learning sample, we esti-
                             mate the probability of correct classification to be 0.74.




                                  Validation
                                          nn
                                  VValidatioalidatio
                             Cross- Cross-Cross- Cross-V  alidatio  n
                             The cross-validation procedure is discussed in detail in Chapter 7. Recall that
                             with cross-validation, we systematically partition the data into testing sets of
                             size k. The n - k observations are used to build the classifier, and the remaining
                             k patterns are used to test it. We continue in this way through the entire data
                             set. When the sample is too small to partition it into a single testing and train-
                             ing set, then cross-validation is the recommended approach. The following is
                             the procedure for calculating the probability of correct classification using
                             cross-validation with k =  1.


                             PROBABILITY OF CORRECT CLASSIFICATION - CROSS-VALIDATION

                                1. Set the number of correctly classified patterns to 0,  N CC =  . 0
                                                                  .
                                2. Keep out one observation, call it  x i
                                3. Build the classifier using the remaining  n –  1   observations.
                                                           to the classifier and obtain a class label
                                4. Present the observation  x i
                                   using the classifier from the previous step.
                                5. If the class label is correct, then increment the number correctly
                                   classified using


                                                       N CC =  N CC +  . 1

                                6. Repeat steps 2 through 5 for each pattern in the sample.
                                7. The probability of correctly classifying an observation is given by

                                                         (
                                                                 ----------
                                                        PCC) =   N CC  .
                                                                  n
                             Example 9.7
                             We return to the iris data of Example 9.6, and we estimate the probability
                             of correct classification using cross-validation with  k =  1.   We first set up
                             some preliminary variables and load the data.
                                load iris
                                % This loads up three matrices:
                                % setosa, versicolor and virginica.
                                % We will use the versicolor and virginica.
                                % Note that the priors are equal, so the decision is
                            © 2002 by Chapman & Hall/CRC
   341   342   343   344   345   346   347   348   349   350   351