Page 156 -
P. 156

Exercises   143

                                    b)  Design  the classifier and estimate its performance  using a partition method for the
                                        test set error estimation.
                                 4.5  Repeat  the  previous  exercise using  the Rocks  dataset  and  two classes:  (granites) vs.
                                    ( limestones, marbles).
                                 4.6  Apply  a  linear  discriminant to  the  projections  of  the  cork  stoppers two-dimensional
                                    data (first  two classes) along the Fisher direction as explained in  section 4.1.4.  Show
                                    that the same results, found with the linear discriminant, are obtained.

                                 4.7  Consider the Fruits  images dataset. Process  the  images  in  order to obtain  interesting
                                    colour  and  shape  features  (a  popular  picture  processing  program,  such  as  the
                                    Micrografx  Picture  Publisher  can  be  used  for  this  purpose).  Design  a  Bayesian
                                    classifier for the 3-class fruit discrimination. Comment the results obtained.

                                 4.8  A  physician  would  like  to  have  a  very  simple  rule  available  for  screening  out  the
                                    carcinoma  situations from  all  other  situations, using  the  same diagnostic means  and
                                    measurements as in the Breast  Tissue dataset.
                                    a)   Using  the Breast  Tissue dataset, find  a  linear  Bayesian  classifier with  only  one
                                        feature  for  the  discrimination  of  carcinoma  versus  all  other  cases  (relax  the
                                        normality  and  equal  variance  requirements).  Use  forward  and  backward  search
                                        and estimate the priors from the training set sizes of the classes.
                                    b)  Obtain  training  set  and  test  set  error  estimates  of  this  classifier  and  95%
                                        confidence intervals.
                                    C)  Using the PR Size program, assess the deviation of the error estimate from the true
                                        Bayesian error, assuming that the normality and equal variance requirements were
                                        satisfied.
                                    d)  Suppose that the risk of missing a carcinoma is three times higher than the risk of
                                        misclassifying  a  non-carcinoma  case.  How  should  the  classifying  rule  be
                                        reformulated  in order to reflect these risks, and what is the performance of the new
                                        rule'?



                                 4.9  Study the influence that using a pooled covariance matrix for the NormZcZd dataset has
                                     on  the  training  set  error  estimate.  For  this  purpose,  perform  the  following
                                     computations:
                                     a)  Change the  off-diagonal  elements of  one of  the  covariance matrices  by  a  small
                                        amount (e.g. 10%).
                                     b)  Compute  the  training  set errors  using  a  quadratic classifier  with  the  individual
                                        covariance matrices.
                                     C)  Compute the training set errors using a linear classifier with the pooled covariance
                                        matrix.
                                     d)  Compare the results obtained in b) and c).

                                  4.10Determine  the  reject  threshold  that  should  be  used  for  the  carcinoma  classifier  of
                                     Exercise 4.8, such that: a) no carcinoma is misclassified; b) only 5% of the carcinomas
                                     are misclassified. Also determine the decision rules for these situations.
   151   152   153   154   155   156   157   158   159   160   161