Page 93 - Introduction to Statistical Pattern Recognition
P. 93
3 Hypothesis Testing 75
Test of normality: Despite its importance, it has been difficult to test
whether a given data set is normal or not. If the dimensionality of the data is
low, we could use a conventional chi-square test [lo]. But obviously the
number of cells increases exponentially with the dimensionality and the test is
impractical for high-dimensional data sets. Measuring the variance of d2 pro-
vides an estimate of y in (3.54) which may be used to test for normality of a
high-dimensional distribution. Also, the parameter p could be determined for a
gamma density function according to (3.57). However, it must be cautioned
that this procedure tests only one marginal aspect of the distribution, however
important that aspect, and does not guarantee the overall normality of the dis-
tribution even if the samples pass the test.
When X is normal and M and are given, the density function of
is
d2 = (X-M)TZc-'(X-M) given in (3.59), which is a gamma distribution.
This may be extended to the case where the sample mean and sample covari-
ance matrix are used in place of M and C as
1 A ..-I
5 = -(X-M)TZ (X-M) , (3.71)
N-1
where
When X is normal, 5 has the beta-distribution given by [l 11
osys1. (3.73)
The expected value and variance of may be computed by using
(3.74)
The results are