Page 331 - Computational Statistics Handbook with MATLAB
P. 331
320 Computational Statistics Handbook with MATLAB
The features we are using for classification are denoted by the d-dimensional
,,
vector x, d = 12 … . With the iris data, we have four measurements, so
d = 4. In the supervised learning situation, each of the observed feature vec-
tors will also have a class label attached to it.
Our goal is to use the data to create a decision rule or classifier that will take
a feature vector x whose class membership is unknown and return the class
it most likely belongs to. A logical way to achieve this is to assign the class
label to this feature vector using the class corresponding to the highest pos-
terior probability. This probability is given by
,
P ω x( ); j = 1 …, . J (9.1)
j
Equation 9.1 represents the probability that the case belongs to the j-th class
given the observed feature vector x. To use this rule, we would evaluate all of
the J posterior probabilities, and the one with the highest probability would
be the class we choose. We can find the posterior probabilities using Bayes’
Theorem:
(
P ω )P x ω( )
(
j
j
P ω x) = ---------------------------------- , (9.2)
j P x()
where
J
(
(
P x() = ∑ P ω j )P x ω j . ) (9.3)
j = 1
We see from Equation 9.2 that we must know the prior probability that it
would be in class j given by
(
,
P ω j ); j = 1 …, , J (9.4)
and the class-conditional probability (sometimes called the state-condi-
tional probability)
,
(
P x ω j ); j = 1 …, . J (9.5)
The class-conditional probability in Equation 9.5 represents the probability
distribution of the features for each class. The prior probability in Equation
9.4 represents our initial degree of belief that an observed set of features is a
case from the j-th class. The process of estimating these probabilities is how
we build the classifier.
We start our explanation with the prior probabilities. These can either be
inferred from prior knowledge of the application, estimated from the data or
© 2002 by Chapman & Hall/CRC

