Page 275 - Applied Statistics Using SPSS, STATISTICA, MATLAB and R
P. 275

256      6 Statistical Classification


                 Entered Removed Min. D Squared

                              Statistic Between    Exact F
                                       Groups
             Step                                 Statistic df1 df2  Sig.
             1   PRT           2.401   1.00and 2.00 60.015  1  147.000 1.176E-12

             2   PRM           3.083   1.00and 2.00 38.279  2  146.000 4.330E-14
             3   N             4.944   1.00and 2.00 40.638  3  145.000 .000

             4   ARTG          5.267   1.00and 2.00 32.248  4  144.000 7.438E-15
             5          PRT    5.098   1.00and 2.00 41.903  3  145.000 .000

             6   RAAR          6.473   1.00and 2.00 39.629  4  144.000 2.316E-22

           Figure 6.22.  Feature selection listing,  obtained  with SPSS (Stepwise Method;
           Mahalanobis   ), using a dynamic search on the cork stopper data (three classes).



           6.6 Classifier Evaluation

           The determination of reliable estimates of  a classifier error rate is obviously an
           essential task in order to assess its usefulness and to compare it with alternative
           solutions.
              As explained in section 6.3.3, design set estimates are on average optimistic and
           the same can be said about using an error formula such as 6.25, when true means
           and covariance are replaced by their sample estimates. It is, therefore, mandatory
           that the classifier be empirically tested, using a test set of independent cases. As
           previously  mentioned in section 6.3.3, these test set estimates are, on average,
           pessimistic.
              The influence of the  finite  sample sizes can  be  summarised as follows (for
           details, consult Fukunaga K, 1990):

              −  The bias  − deviation  of the error estimate from the true error  − is
                 predominantly influenced by the finiteness of the design set;
              −  The  variance  of the error estimate is predominantly influenced by the
                 finiteness of the test set.

              In  normal practice, we  only have a  data set  S with  n samples available. The
           problem arises of how to divide the available cases into design set and test set.
           Among a vast number of methods (see e.g. Fukunaga K, Hayes RR, 1989b) the
           following ones are easily implemented in SPSS and/or STATISTICA:
   270   271   272   273   274   275   276   277   278   279   280