Page 313 -
P. 313

302    Appendix B. CD Tools

                             8.3 Design Set Size


                             The PR Size program is intended to provide some guidance to the PR designer on
                             the choice of  the appropriate system complexity, given the design set size. It has
                             the following modules:

                             SC Size (Statistical Classifier Design)

                             Displays a picture box  containing graphics of  the following variables, for a two-
                             class linear classifier with specified Battacharrya distance and for several values of
                             the dimensionality ratio, nld:

                               Bayes error;
                               Expected design set error (resubstitution method);
                               Expected test set error (holdout method).

                               Both classes are assumed to be represented by  the same number of patterns per
                             class, n.
                               The user only has to specify the dimension d and the square of the Battachanya
                             distance (computable by several statistical software products).
                               For any chosen value of nld, the program also displays the standard deviations
                             of the error estimates when the mouse is clicked over a selected point of the picture
                             box.
                               The  expected  design  and  test  set  errors  are  computed  using  the  formulas
                             presented in the work of Foley (1972) mentioned in  the bibliography of Chapter 4.
                             The formula for the expected test set error is an approximation formula, which can
                             produce slightly erroneous values, below the Bayes error, for certain nld ratios.



                             NN Size (Neural Network Design)
                             Displays tables of  the following values, for a two-class two-layer MLP and for a
                             user-specified interval of the number of hidden nodes, h:

                             - Number of neurons.
                             - Number of weights (including biases).
                             - Lower bound of the Vapnik-Chervonenkis dimension (formula 5-52).
                             - Upper bound of the Vapnik-Chervonenkis dimension (formula 5-56).
                             - Lower bound of learning set size needed for generalization (formula 5-53).
                             - Upper bound of learning set size sufficient for generalization (formula 5-56a).

                               The user has to specify the dimension, d, corresponding to the number of MLP
                             inputs, the training set error and the confidence level of the training set error.

                             Author: JP Marques de Sa, Engineering Faculty, Oporto University.
   308   309   310   311   312   313   314   315   316   317   318