Page 184 - Computational Statistics Handbook with MATLAB
P. 184
Chapter 5: Exploratory Data Analysis 171
is a box in the projection plane.
B k
I is the indicator function for region B .
B k k
,
⁄
,
η j = πj 36 , j = 0 … 8 is the angle by which the data are rotated in
.
the plane before being assigned to regions B k
αη() and β η() are given by
j
j
()
αη j = αcos η j – βsin η j
(5.14)
()
βη j = αsin η j + βcos η j
c is a scalar that determines the size of the neighborhood around
( α β, * ) that is visited in the search for planes that provide better
*
values for the projection pursuit index.
v is a vector uniformly distributed on the unit d-dimensional sphere.
half specifies the number of steps without an increase in the projection
index, at which time the value of the neighborhood is halved.
m represents the number of searches or random starts to find the best
plane.
e
exx
IndInd
ion
ionPursuitPursuit
Pr
PPrr ojeoje
ct
Projeojec cctt tionPursuitionPursuit Ind Ind eexx
Posse [1995a, 1995b] developed an index based on the chi-square. The plane
that are distributed in rings. See
is first divided into 48 regions or boxes B k
Figure 5.44 for an illustration of how the plane is partitioned. All regions have
the same angular width of 45 degrees and the inner regions have the same
⁄
radial width of 2log( 6) 12 5 ⁄ . This choice for the radial width provides
regions with approximately the same probability for the standard bivariate
normal distribution. The regions in the outer ring have probability 148⁄ . The
regions are constructed in this way to account for the radial symmetry of the
bivariate normal distribution.
Posse [1995a, 1995b] provides the population version of the projection
index. We present only the empirical version here, because that is the one that
must be implemented on the computer. The projection index is given by
8 48 n 2
(
(
1 1
PI 2 αβ,( ) = 1 - ∑ ∑ ---- --- ∑ I ( z αη ) j z , βη ) j ) – c . (5.15)
--
χ 9 c k n B k i i k
j = 1 k = 1 i = 1
The chi-square projection index is not affected by the presence of outliers.
This means that an interesting projection obtained using this index will not
be one that is interesting solely because of outliers, unlike some of the other
indexes (see Appendix C). It is sensitive to distributions that have a hole in
the core, and it will also yield projections that contain clusters. The chi-square
projection pursuit index is fast and easy to compute, making it appropriate
© 2002 by Chapman & Hall/CRC