Page 253 - Applied Statistics Using SPSS, STATISTICA, MATLAB and R
P. 253

234      6 Statistical Classification


           vectors, x  and y , of integer classification labels. The classification matrix of Table
           6.3 can be obtained as follows, assuming the cork   data frame has been attached
           with columns ND  , PRT   and CL   corresponding to variables N, PRT and CLASS,
           respectively:

              > y <- cbind(ND[1:100],PRT[1:100]/10)
              > co <- classify(y,y,CL[1:100])
              > classmatrix(CL[1:100],co)

              The  meanings  of  MATLAB’s classify arguments are the same as in R.
           MATLAB does not provide a function for obtaining the classification matrix. We
           include in the book CD the classmatrix   function for this purpose, working in
           the same way as in R.
              We didn t obtain the same values in MATLAB as we did with the other software
                    ’
           products. The reason may be attributed to the fact that MATLAB apparently does
           not use pooled covariances (therefore, is not providing linear discriminants).




           6.3 Bayesian Classification

           In the previous sections, we presented linear classifiers based solely on the notion
           of distance to class means. We did not assume anything specific regarding the data
           distributions.  In this section, we  will take into account  the specific probability
           distributions of the cases in each class, thereby being able to adjust the classifier to
           the specific risks of a classification.



           6.3.1 Bayes Rule for Minimum Risk
           Let us again consider the cork stopper problem and imagine that factory production
           was restricted to the two classes we have been considering, denoted as: ω 1  = Super
           and ω 2 = Average. Let us assume further that the factory had a record of production
           stocks for a reasonably long period, summarised as:

              Number of produced cork stoppers of class ω  1:   n 1  =     901 420
              Number of produced cork stoppers of class ω  2:   n 2  =  1 352 130
              Total number of produced cork stoppers:     n   =  2 253 550

              With this information, we can readily obtain good estimates of the probabilities
           of  producing  a cork stopper from either of the two classes, the so-called  prior
           probabilities or prevalences:

              P(ω 1) = n 1/n = 0.4;       P(ω 2) = n 2/n = 0.6.            6.14
   248   249   250   251   252   253   254   255   256   257   258