Page 324 - Classification Parameter Estimation & State Estimation An Engg Approach Using MATLAB

P. 324

BOSTON HOUSING CLASSIFICATION PROBLEM 313

indicating that some overtraining occurs and/or that some feature selection
and feature scaling might be required. The next step, therefore, is to rescale
the data to have zero mean and unit variance in all feature directions. This
can be performed using the PRTools function scalem([],‘variance’):

Listing 9.2

load housing.mat;
% Define an untrained linear classifier w/scaled input data
w_sc ¼ scalem([],‘variance’);
w ¼ w_sc*ldc;
% Perform 5-fold cross-validation
err_ldc_sc ¼ crossval(z,w,5)
% Do the same for some other classifiers
err_qdc_sc ¼ crossval(z,w_sc*qdc,5)
err_knnc_sc ¼ crossval(z,w_sc*knnc,5)
err_parzenc_sc ¼ crossval(z,w_sc*parzenc,5)
First note, that when we introduce a preprocessing step, this step should
be defined inside the mapping w. The obvious approach, to map the
whole data set z_sc ¼ z*scalem(a,‘variance’) and then to apply
the cross-validation to estimate the classifier performance, is incorrect.
In that case, some of the testing data is already used in the scaling of the
data, resulting in an overtrained classifier, and thus in an unfair estimate
of the error. To avoid this, the mapping should be extended from ldc to
w_sc*ldc. The routine crossval then takes care of fitting both the
scaling and the classifier.
By scaling the features, the performance of the first two classifiers,
ldc and qdc, should not change. The normal density based classifiers
are insensitive to the scaling of the individual features, because they
already use their variance estimation. The performance of knnc and
parzenc on the other hand improve significantly, to 14.1% and
13.1%, respectively (with a standard deviation of about 0.4%).
Although parzenc approaches the performance of the linear classifier,
it is still slightly worse. Perhaps feature extraction or feature selection
will improve the results.
As discussed in Chapter 7, principal component analysis (PCA) is one
of the most often used feature extraction methods. It focuses on the high-
variance directions in the data, and removes low-variance directions.
In this data set we have seen that the feature values have very different
scales. Applying PCA directly to this data will put high emphasis on the
feature TAX and will probably ignore the feature NOX. Indeed, when

319 320 321 322 323 324 325 326 327 328 329