Page 225 - Classification Parameter Estimation & State Estimation An Engg Approach Using MATLAB
P. 225

214                         FEATURE EXTRACTION AND SELECTION

            Devijver, P.A. and Kittler, J., Pattern Recognition, a Statistical Approach. Prentice-Hall,
             London, UK, 1982.
            Fisher, R.A., The use of multiple measurements in taxonomic problems. Annals of
             Eugenics, 7, 179–88, 1936.



            6.5   EXERCISES

            1. Prove equation (6.4). ( )
                                                 m
                                         m
                                                    m
                                             m
                                                        m
                                    m
              Hint: use z k,n   z l,m ¼ (z k,n   ^ m ) þ (^ m   ^ m) þ (^ m   ^ m ) þ (^ m   z l,m ).
                                     k
                                                     l
                                          k
                                                         l
            2. Develop an algorithm that creates a tree structure like in Figure 6.4. Can you adapt
              that algorithm such that the tree becomes minimal (thus, without the superfluous
              twigs)? (0)
            3. Under what circumstances would it be advisable to use forward selection, or plus-
              l-takeaway-r selection with l > r? And backward selection, or plus-l-takeaway-r selec-
              tion with l < r? (0)
                            T
            4. Prove that W ¼ m C  1  is the feature extractor that maximizes the Bhattacharyyaa
              distance in the two-class Gaussian case with equal covariance matrices. (  )
            5. In Listing 6.3, fisherm is called with 0:9 as its third argument. Why do you think this
              is used? Try the same routine, but leave out the third argument (i.e. use
              w ¼ fisherm(z, 24)). Can you explain what you see now? ( )
            6. Find an alternative method of preventing the singularities you saw in Exercise 6.5. Will
              the results be the same as those found using the original Listing 6.3? (  )
            7. What is the danger of optimizing the parameters of the feature extraction or selection
              stage, such as the number of features to retain, on the training set? How could you
              circumvent this? (0)
   220   221   222   223   224   225   226   227   228   229   230