Page 157 -
P. 157
144 4 Statistical Classification
4.1 1 Repeat exercise 4.4, considering only two classes: N and P. Determine afterwards
which reject threshold best matches the S (suspect) cases.
4.12 Use the Parzen.xls file to repeat the experiments shown in Figure 4.28 for other types
of distributions, namely the normal and the logistic distributions.
4.13 Apply the Parzen window method to the first two classes of the cork stoppers data with
features N and PRT10, using the probabilistic neural network approach for pattern
classification. Also use the weight values to derive the probability density estimates
(limit the training set to 10 cases per class and use Microsoft Excel).
4.14 Perform a k-NN classification of the Breast Tissue data in order to discriminate
carcinoma cases from all other cases. Use the KNN program in the partition and edition
methods. Compare the results.
4.15 Consider the k-NN classification of the Rocks data, using two classes: {granites,
diorites, schists] vs. (limestones, marbles, breccias 1.
a) Give an estimate of the number of neighbours, k, that should be used.
b) For the previously estimated k, what is the expected deviation of the asymptotic
error of the k-NN classifier from the Bayes error?
C) Perform the classification with the KNN program, using the partition and edition
methods. Compare the results.
4.16 Explain why all ROC curves start at (0,O) and finish at (1,l) by analysing what kind of
situations these points correspond to.
4.17 Consider the Breast Tissue dataset. Use the ROC curve approach to determine single
features that will discriminate carcinoma cases from all other cases. Compare the
alternative methods using the ROC curve areas.
4.18 Repeat the ROC curve experiments illustrated in Figure 4.34 for the FHR Apgar
dataset, using combinations of features.
4.19 Increase the amplitude of the signal impulses by 20% in the Signal Noise dataset.
Consider the following impulse dctcction rule:
2
An impulse is detected at time n when s(n) is bigger than axi=, (s(n - i) + s(n + i)).
Determine the ROC curve corresponding to a variable a, and determine the best crfor
the impulselnoise discrimination. How does this method compare with the amplitude
threshold method described in section 4.3.3?
4.20 Apply the branch-and-bound method to perform feature selection for the first two
classes of the Cork Stoppers data.
4.21 Repeat Exercises 4.4 and 4.5 performing sequential feature selection (direct and
dynamic).