Page 183 - Computational Statistics Handbook with MATLAB

P. 183

170 Computational Statistics Handbook with MATLAB

Posse [1995a, 1995b] uses a random search to locate the global optimum of the
projection index and combines it with the structure removal of Freidman
[1987] to get a sequence of interesting 2-D projections. Each projection found
shows a structure that is less important (in terms of the projection index) than
the previous one. Before we describe this method for PPEDA, we give a sum-
mary of the notation that we use in projection pursuit exploratory data anal-
ysis.

NOTATION - PROJECTION PURSUIT EXPLORATORY DATA ANALYSIS
(
X is an n × d matrix, where each row X i ) corresponds to a d-dimen-
sional observation and n is the sample size.
Z is the sphered version of X.
ˆ
µ µ µ µ is the 1 × d sample mean:

ˆ
µ µ µ µ = ∑ X i n⁄ . (5.10)
ˆ
Σ Σ Σ Σ is the sample covariance matrix:

ˆ 1 ˆ ˆ T
(
Σ Σ Σ Σ ij = ------------ ∑ ( X i – µ µ µ µ) X j – µ µ µ µ) . (5.11)
n – 1
αβ are orthonormal (α α = 1 = β β and α β = 0 ) d-dimensional
,
T
T
T
vectors that span the projection plane.
,
(
P αβ) is the projection plane spanned by α and .
β
α , β α
z i z i are the sphered observations projected onto the vectors and
β :
α T
z = z α
i i (5.12)
β T
z i = z i β
( α β, * ) denotes the plane where the index is maximum.
*
PI χ αβ,( ) denotes the chi-square projection index evaluated using
2
the data projected onto the plane spanned by α and .
β
is the standard bivariate normal density.
φ 2
is the probability evaluated over the k-th region using the standard
c k
bivariate normal,
∫ ∫ d . (5.13)
c k = φ 2 zd 1 z 2
B
k

178 179 180 181 182 183 184 185 186 187 188