Page 155 -
P. 155
142 4 Statistical Classification
Narendra P, Fukunaga K (1977) A Branch and Bound Algorithm for Feature Subset
Selection. IEEE Tr Comp 26:9I7-922.
Niemann H (1990) Pattern Analysis and Understanding. Springer Verlag.
Pipes LA (1977) Matrix-Computer Methods in Engineering. Krieger Pub. Co.
Raudys S, Pikelis V (1980) On dimensionality, sample size, classification error and
complexity of classitication algorithm in pattern recognition. IEEE Tr Patt Anal Mach
Intel 2:242-252.
Schalkoff R (1992) Pattern Recognition. Wiley, New York.
Swets JA (1973) The Relative Operating Characteristic in Psychology. Science, 182:990-
1000.
Shapiro SS, Wilk SS, Chen SW (1968) A comparative study of various tests for normality. J
Am Stat Ass, 63:1343-1372.
Siegel S, Castellan NJ Jr (1988) Nonparametric Statistics for the Behavioral Sciences. Ms
Graw Hill, New York.
Specht DF (1990) Probabilistic Neural Networks. Neural Networks, 3: 109-1 18.
Swain PH (1977) The decision tree classifier: Design and potential. IEEE Tr Geosci Elect,
15:142-147.
Toussaint GT (1974) Bibliography on Estimation of Misclassification. IEEE Tr Info Theory,
20:472-479.
Exercises
4.1 Consider the first two classes of the Cork Stoppers dataset, described by features ART
and PRT.
a) Determine the Euclidian and Mahalanobis classifiers using feature ART alone,
then using both ART and PRT.
b) Compute the Bayes error using a pooled covariance estimate as the true
covariance for both classes.
c) Determine whether the Mahalanobis classifiers are expected to be near the optimal
Bayesian classifier.
d) Using PR Size determine the average deviation of the training set error estimate
from the Bayes error, and the 95% confidence interval of the error estimate.
e) Determine the classification of one cork stopper using the correlation approach.
4.2 Consider the first two classes of the Cork Stoppers dataset, dcscribed by features ART
and PRT. Compute the linear discriminant corresponding to the Euclidian classifier
using formula 4-3c.
4.3 Repeat the previous exercises for the three classes of the Cork Stoppers dataset, using
features N, PRM and ARTG. Compute the pooled covariance matrix and determine the
influence of small changes in its values on the classifier performance.
4.4 Consider the problem of classifying cardiotocograms (CTG dataset) into three classes:
N (normal), S (suspect) and P (pathological).
a) Determine which features are most discriminative and appropriate for a
Mahalanobis classifier approach for this problem.