Page 212 - Classification Parameter Estimation & State Estimation An Engg Approach Using MATLAB
P. 212
FEATURE SELECTION 201
(a) (b)
Figure 6.6 Character classification for license plate recognition. (a) Character sets
from license plates, before and after normalization. (b) Selected features. The num-
ber of features is 18 and 50 respectively
A feature selection procedure, based on the ‘plus l-take away r’
method (l ¼ 3, r ¼ 2) and the inter/intra distance (Section 6.1.1) gives
feature sets as depicted in Figure 6.6(b). Using a validation set con-
sisting of about 50 000 samples, it was established that 50 features
gives the minimal error rate. A number of features above 50 intro-
duces overfitting. The pattern of 18 selected features, as shown in
Figure 6.6(b), is one of the intermediate results that were obtained to
get the optimal set with 50 features. It indicates which part of a
bitmap is most important to recognize the character.
6.2.3 Implementation issues
PRTools offers a large range of feature selection methods. The
evaluation criteria are implemented in the function feateval, and
are basically all inter/intra cluster criteria. Additionally, a -nearest
neighbour classification error is defined as a criterion. This will give
a reliable estimate of the classification complexity of the reduced
data set, but can be very computationally intensive. For larger data
sets it is therefore recommended to use the simpler inter-intra-cluster
measures.
PRTools also offers several search strategies, i.e. the branch-and-
bound algorithm, plus-l-takeaway-r, forward selection and backward
selection. Feature selection mappings can be found using the function
featselm. The following listing is an example.