Page 157 -
P. 157

144     4 Statistical Classification

                                4.1 1 Repeat  exercise  4.4,  considering  only  two  classes:  N and  P.  Determine  afterwards
                                    which reject threshold best matches the S (suspect) cases.

                                4.12 Use the Parzen.xls file to repeat the experiments shown in Figure 4.28 for other types
                                    of distributions, namely the normal and the logistic distributions.

                                4.13 Apply the Parzen window method to the first two classes of the cork stoppers data with
                                    features  N  and  PRT10,  using  the  probabilistic  neural  network  approach  for pattern
                                    classification. Also  use  the weight  values  to  derive  the probability  density  estimates
                                    (limit the training set to 10 cases per class and use Microsoft Excel).
                                4.14 Perform  a  k-NN  classification  of  the  Breast  Tissue  data  in  order  to  discriminate
                                    carcinoma cases from all other cases. Use the KNN program in the partition and edition
                                    methods. Compare the results.

                                4.15 Consider  the  k-NN classification  of  the  Rocks  data,  using  two  classes:  {granites,
                                    diorites, schists] vs. (limestones, marbles, breccias 1.
                                    a)  Give an estimate of the number of neighbours, k, that should be used.
                                    b)  For the previously  estimated k,  what is the expected deviation of  the asymptotic
                                        error of the k-NN classifier from the Bayes error?
                                    C)  Perform  the classification  with the KNN  program, using the partition  and edition
                                        methods. Compare the results.

                                4.16 Explain  why  all ROC curves start at (0,O) and finish at (1,l) by  analysing what kind of
                                    situations these points correspond to.

                                4.17 Consider the Breast  Tissue dataset. Use the ROC curve approach  to determine single
                                    features  that  will  discriminate  carcinoma  cases  from  all  other  cases.  Compare the
                                    alternative methods using the ROC curve areas.
                                4.18 Repeat  the  ROC  curve  experiments  illustrated  in  Figure  4.34  for  the  FHR  Apgar
                                    dataset, using combinations of features.

                                4.19 Increase  the  amplitude  of  the  signal  impulses  by  20%  in  the  Signal  Noise  dataset.
                                    Consider the following impulse dctcction rule:

                                                                                  2
                                    An impulse is detected at time n when s(n) is bigger than  axi=, (s(n - i) + s(n + i)).
                                    Determine  the ROC curve corresponding to a variable  a, and determine the best crfor
                                    the impulselnoise discrimination. How does this  method  compare with  the amplitude
                                    threshold method described in section 4.3.3?

                                 4.20 Apply  the  branch-and-bound  method  to  perform  feature  selection  for  the  first two
                                    classes of the Cork Stoppers data.

                                 4.21  Repeat  Exercises  4.4  and  4.5  performing  sequential  feature  selection  (direct  and
                                   dynamic).
   152   153   154   155   156   157   158   159   160   161   162