Page 62 -
P. 62

48     2 Pattern Discrimination





















         Figure 2.25. Two class scatter plot with dimensionality ratio nld = 30.



           This is a dramatic example of how the use of a reduced set of patterns compared
         to the number of features - i.e., the use of a low dimensionality ratio, n/d - can lead
         to totally wrong conclusions about a classifier (or regressor) performance evaluated
         in  a  training  set.  We  can  get  more  insight  into  this  dimensionality  problem  by
         looking  at  it  from  the  perspective  of  how  many  patterns  one  needs  to  have
         available  in  order  to  design  a  classifier,  i.e.,  what  is  the  minimum  size  of  the
         training  set. Consider  that  we  would  be  able  to  train  the classifier by  deducing a
         rule based  on the location of each pattern in  the d-dimensional  space. In  a certain
         sense, this  is  in  fact how  the  neural  network  approach  works. In  order to have a
         sufficient resolution  we assume that the range of values for each feature is divided
         into rn  intervals; therefore we have to assess the location of each pattern  in each of
         the nz"  hypercubes. This  number of  hypercubes  grows exponentially  so that  for a
         value  of  d  that  is  not  too  low  we  have  to  find  a  mapping  for  a  quite  sparsely
         occupied space, i.e., with a poor representation  of the mapping.
            This  phenomenon,  generally  called  the  curse  of  dimensionality  phenomenon,
         also affects our common  intuition about the concept of neighbourhood.  In  order to
         see this,  imagine  that  we  have  a  one-dimensional  normal  distribution. We know
         then  that  about  68%  of  the  distributed  values  lie  within  one  standard  deviation
         around the mean. If we increase our representation  to two independent dimensions,
         we now have only about 46% in  a circle around the mean and for a d-dimensional
         representation  we  have  (0.68)"~100% samples  in  a hypersphere  with  a  radius  of
         one standard deviation, which  means, as shown in  Figure 2.26, that  for (/=I2 less
         than  1% of the data is in the neighbourhood of the mean! For the well-known 95%
         neighbourhood, corresponding  approxirnately  to  two  standard deviations,  we will
         find only about 54% of the patterns for d=12.
            The dimensionality ratio issue is a central  issue in  PR with a deep influence on
         the quality of  any PR  project  and, therefore, we  will dedicate  special  attention  to
         this issue at all opportune moments.
   57   58   59   60   61   62   63   64   65   66   67