Page 152 - Classification Parameter Estimation & State Estimation An Engg Approach Using MATLAB
P. 152
TRAINING SETS 141
these methods are expensive and therefore allowed only if the number of
samples is not too large.
The number of samples in the training set is denoted by N S . The
samples are enumerated by the symbol n ¼ 1, ... , N S . Object n has a
measurement vector z n . The true class of the n-th object is denoted by
n 2 O. Then, a labelled training set T S contains samples (z n , n ) each
one consisting of a measurement vector and its true class:
T S ¼fðz n ; n Þg with n ¼ 1; .. . ; N S ð5:1Þ
Another representation of the data set is obtained if we split the training
set according to their true classes:
T S ¼fz k;n g with k ¼ 1; .. . ; K and n ¼ 1; .. . ; N k ð5:2Þ
where N k is the number of samples with class ! k and K ¼jOj is the
number of classes. A representation equivalent to (5.2) is to introduce a
set T k for each class.
T k ¼fðz n ; n Þj n ¼ ! k g with k ¼ 1; .. . ; K and n ¼ 1; ... ; N k
ð5:3Þ
It is understood that the numberings of samples used in these three
representations do not coincide. Since the representations are equivalent,
we have:
K
X
N S ¼ N k ð5:4Þ
k¼1
In PRTools, data sets are always represented as in (5.1). In order to
obtain representations as in (5.3), separate data sets for each of the
classes will have to be constructed. In Listing 5.1 these two ways are
shown. It is assumed that dat is an N d matrix containing the meas-
urements, and lab an N 1 matrix containing the class labels.
Listing 5.1
Two methods of representing data sets in PRTools. The first method is
used almost exclusively.
% Create a standard MATLAB dataset from data and labels.
% Method (5.1):