Page 242 - Applied Statistics Using SPSS, STATISTICA, MATLAB and R
P. 242

6 Statistical Classification










           Statistical classification  deals with rules  of case assignment to categories  or
           classes. The classification,  or  decision rule, is expressed in terms of a set of
           random variables  − the case  features. In order to  derive the decision rule, one
           assumes that a training set of pre-classified cases − the data sample − is available,
           and can be used to determine the sought after rule applicable to new cases. The
           decision rule  can be derived in a  model-based approach, whenever a joint
           distribution of the random variables can be assumed, or in a model-free approach,
           otherwise.



           6.1  Decision Regions and Functions

           Consider a data sample constituted by n cases, depending on d features. The central
           idea in statistical classification is to use the data sample, represented by vectors in
                d
           an  ℜ feature space, in order to derive a decision rule that partitions the feature
           space into regions assigned to the classification classes. These regions are called
           decision regions.  If a feature vector  falls into a certain  decision region, the
           associated case is assigned to the corresponding class.
              Let us assume two classes, ω 1 and ω 2, of cases described by two-dimensional
           feature vectors  (coordinates x 1 and x 2) as shown in Figure 6.1. The features are
           random variables, X 1 and X  2, respectively.
                                                       ] ∈
                                                             2
              Each case is represented  by a vector  x =  [x 1  x ’  ℜ . In Figure 6.1,  we
                                                      2
           used  o  to denote class ω 1 cases and  ×  to denote class ω 2  cases. In general, the
                                           “ ”
                “ ”
           cases of each class  will be characterised by  random distributions of the
           corresponding feature  vectors, as illustrated in Figure  6.1,  where the ellipses
           represent equal-probability density curves that enclose most of the cases.
              Figure 6.1 also shows a straight line separating the two classes. We can easily
           write the equation of the straight line in terms of the features  X 1,  X 2 using
           coefficients or weights w 1, w 2 and a bias term w 0 as shown in equation 6.1. The
           weights determine the slope of the straight line; the bias determines the straight
           line intersect with the axes.

              d  X  , 1 X 2  (x ) ≡  ( d x ) = w 1 x 1  + w 2  x 2  + w 0  =  0 .  6.1

              Equation 6.1 also allows interpretation of the straight line as the root set of a
           linear function  d(x). We  say  that  d(x) is a  linear decision function that divides
   237   238   239   240   241   242   243   244   245   246   247