Page 256 - Introduction to Statistical Pattern Recognition
P. 256

238                        Introduction to Statistical Pattern Recognition







                                                   1
                                          = It    -g,  (X, Y)F(X) dX  = 0 .     (5.162)
                                             h (X)=O  ?I

                      That is, as long as i(X) = 0 at h (X) = 0, the  effect  of  an  individual  sample is
                      negligible.  Even  if  the quadratic classifier is not optimal, AE~ is dominated  by
                      a  1/7L  term.  Thus,  as  one  would  expect,  as  the  number  of  design  samples
                      becomes larger, the effect of an individual sample diminishes.
                           In order to confirm the above results,  the following  experiment was con-
                      ducted.
                           Experiment 9: Effect of removing one sample
                                Data: I-I, I-41, I-A (Normal, n  = 8)
                                Classifier: Quadratic classifier of (5.54)
                                Design samples: 7il = 712 = 24, 40, 80,  160, 320
                                Test: Theoretical using  (3.119)-(3.128)
                                 No. of trials: T = 10
                                 Results: Table 5-1 1 [6]
                      Table 5-1 1 shows that, even  if  the squared distance of  Y   EO^) from MI, d2, is
                      much larger than n, the effect is still negligible.  The expected  value of d2 is n
                      when X is distributed normally.

                      5.4  Bootstrap Methods
                      Bootstrap Errors

                           Bootstrap  method: So  far,  we  have  studied  how  to  bound  the  Bayes
                      error  based  on  available  sample  sets.  That  is,  we  draw  z  sample  sets
                      SI,. . . ,ST, from  the  true  distributions,  P, as  seen  in  Fig.  5-5,  where  each
                      sample  set contains N  I  ol-samples and  N2 m2-samples.  For each  S,, we  can
                                                     A
                      apply  the L  and R  methods  to obtain  E~, and  E~,. The averages  of  these  EL,’s
                                                                    ,.
                      and  E~,’S over r  sets  approximate  the  upper  and  lower bounds  of  the  Bayes
                                       I                                    A
                      error, E ( E~ ]  and  E ( E~ }.  The standard  deviations of r EL,’S and  E~,’s indicate
                          ,.
                                L
                      how  E~ and E~ vary.  However, in many  cases in practice, only one sample set
   251   252   253   254   255   256   257   258   259   260   261