Page 156 -
P. 156
Exercises 143
b) Design the classifier and estimate its performance using a partition method for the
test set error estimation.
4.5 Repeat the previous exercise using the Rocks dataset and two classes: (granites) vs.
( limestones, marbles).
4.6 Apply a linear discriminant to the projections of the cork stoppers two-dimensional
data (first two classes) along the Fisher direction as explained in section 4.1.4. Show
that the same results, found with the linear discriminant, are obtained.
4.7 Consider the Fruits images dataset. Process the images in order to obtain interesting
colour and shape features (a popular picture processing program, such as the
Micrografx Picture Publisher can be used for this purpose). Design a Bayesian
classifier for the 3-class fruit discrimination. Comment the results obtained.
4.8 A physician would like to have a very simple rule available for screening out the
carcinoma situations from all other situations, using the same diagnostic means and
measurements as in the Breast Tissue dataset.
a) Using the Breast Tissue dataset, find a linear Bayesian classifier with only one
feature for the discrimination of carcinoma versus all other cases (relax the
normality and equal variance requirements). Use forward and backward search
and estimate the priors from the training set sizes of the classes.
b) Obtain training set and test set error estimates of this classifier and 95%
confidence intervals.
C) Using the PR Size program, assess the deviation of the error estimate from the true
Bayesian error, assuming that the normality and equal variance requirements were
satisfied.
d) Suppose that the risk of missing a carcinoma is three times higher than the risk of
misclassifying a non-carcinoma case. How should the classifying rule be
reformulated in order to reflect these risks, and what is the performance of the new
rule'?
4.9 Study the influence that using a pooled covariance matrix for the NormZcZd dataset has
on the training set error estimate. For this purpose, perform the following
computations:
a) Change the off-diagonal elements of one of the covariance matrices by a small
amount (e.g. 10%).
b) Compute the training set errors using a quadratic classifier with the individual
covariance matrices.
C) Compute the training set errors using a linear classifier with the pooled covariance
matrix.
d) Compare the results obtained in b) and c).
4.10Determine the reject threshold that should be used for the carcinoma classifier of
Exercise 4.8, such that: a) no carcinoma is misclassified; b) only 5% of the carcinomas
are misclassified. Also determine the decision rules for these situations.