Page 331 - Computational Statistics Handbook with MATLAB
P. 331

320                        Computational Statistics Handbook with MATLAB


                             The features we are using for classification are denoted by the d-dimensional
                                          ,,
                             vector x,  d =  12 …  . With the iris data, we have four measurements, so
                             d =  4.   In the supervised learning situation, each of the observed feature vec-
                             tors will also have a class label attached to it.
                              Our goal is to use the data to create a decision rule or classifier that will take
                             a feature vector x whose class membership is unknown and return the class
                             it most likely belongs to. A logical way to achieve this is to assign the class
                             label to this feature vector using the class corresponding to the highest pos-
                             terior probability. This probability is given by

                                                                    ,
                                                   P ω x(  );  j =  1 …,  . J               (9.1)
                                                       j
                              Equation 9.1 represents the probability that the case belongs to the j-th class
                             given the observed feature vector x. To use this rule, we would evaluate all of
                             the J posterior probabilities, and the one with the highest probability would
                             be the class we choose. We can find the posterior probabilities using Bayes’
                             Theorem:
                                                              (
                                                             P ω )P x ω(  )
                                                     (
                                                                       j
                                                                 j
                                                   P ω x) =  ----------------------------------  ,  (9.2)
                                                       j         P x()
                             where
                                                           J
                                                                   (
                                                              (
                                                   P x() =  ∑  P ω j )P x ω j  . )          (9.3)
                                                          j =  1
                             We see from Equation 9.2 that we must know the prior probability that it
                             would be in class j given by

                                                     (
                                                                   ,
                                                    P ω j );  j =  1 …,  , J                (9.4)
                             and the class-conditional probability (sometimes called the state-condi-
                             tional probability)

                                                                    ,
                                                    (
                                                   P x ω j );  j =  1 …,  . J               (9.5)
                              The class-conditional probability in Equation 9.5 represents the probability
                             distribution of the features for each class. The prior probability in Equation
                             9.4 represents our initial degree of belief that an observed set of features is a
                             case from the j-th class. The process of estimating these probabilities is how
                             we build the classifier.
                              We start our explanation with the prior probabilities. These can either be
                             inferred from prior knowledge of the application, estimated from the data or
                            © 2002 by Chapman & Hall/CRC
   326   327   328   329   330   331   332   333   334   335   336