Page 152 - Classification Parameter Estimation & State Estimation An Engg Approach Using MATLAB
P. 152

TRAINING SETS                                                141

            these methods are expensive and therefore allowed only if the number of
            samples is not too large.
              The number of samples in the training set is denoted by N S . The
            samples are enumerated by the symbol n ¼ 1, ... , N S . Object n has a
            measurement vector z n . The true class of the n-th object is denoted by
              n 2 O. Then, a labelled training set T S contains samples (z n ,   n ) each
            one consisting of a measurement vector and its true class:

                           T S ¼fðz n ;  n Þg  with n ¼ 1; .. . ; N S   ð5:1Þ

            Another representation of the data set is obtained if we split the training
            set according to their true classes:

                 T S ¼fz k;n g  with k ¼ 1; .. . ; K  and n ¼ 1; .. . ; N k  ð5:2Þ

            where N k is the number of samples with class ! k and K ¼jOj is the
            number of classes. A representation equivalent to (5.2) is to introduce a
            set T k for each class.

              T k ¼fðz n ;  n Þj  n ¼ ! k g with  k ¼ 1; .. . ; K and  n ¼ 1; ... ; N k

                                                                        ð5:3Þ
            It is understood that the numberings of samples used in these three
            representations do not coincide. Since the representations are equivalent,
            we have:

                                             K
                                            X
                                      N S ¼    N k                      ð5:4Þ
                                            k¼1
            In PRTools, data sets are always represented as in (5.1). In order to
            obtain representations as in (5.3), separate data sets for each of the
            classes will have to be constructed. In Listing 5.1 these two ways are
            shown. It is assumed that dat is an N   d matrix containing the meas-
            urements, and lab an N   1 matrix containing the class labels.

            Listing 5.1
            Two methods of representing data sets in PRTools. The first method is
            used almost exclusively.


            % Create a standard MATLAB dataset from data and labels.
            % Method (5.1):
   147   148   149   150   151   152   153   154   155   156   157