Page 93 - Introduction to Statistical Pattern Recognition
P. 93

3 Hypothesis Testing                                           75



                         Test of  normality:  Despite  its  importance, it  has been  difficult to  test
                    whether a given data set is normal or not.  If the dimensionality of  the data is
                    low,  we  could  use  a  conventional chi-square  test  [lo].  But  obviously  the
                    number of cells increases exponentially with the dimensionality and the test is
                     impractical for high-dimensional data sets.  Measuring the variance of  d2 pro-
                    vides an  estimate of  y  in  (3.54) which may  be  used  to  test for normality of  a
                    high-dimensional distribution.  Also, the parameter p could be  determined for a
                    gamma density function according to  (3.57).  However,  it must  be  cautioned
                    that this procedure tests only one marginal aspect of  the distribution, however
                     important that aspect, and does not guarantee the overall normality of  the dis-
                     tribution even if the samples pass the test.
                         When  X  is  normal  and  M and   are  given,  the  density  function  of
                                         is
                     d2 = (X-M)TZc-'(X-M) given  in  (3.59),  which  is  a  gamma  distribution.
                    This may be  extended to the case where the sample mean and sample covari-
                     ance matrix are used in place of M and C as

                                    1     A   ..-I
                              5 = -(X-M)TZ      (X-M)  ,                        (3.71)
                                  N-1
                     where





                     When X is normal, 5 has the beta-distribution given by  [l 11



                                                                 osys1.         (3.73)




                     The expected value and variance of  may be computed by  using


                                                                                (3.74)


                     The results are
   88   89   90   91   92   93   94   95   96   97   98