Page 181 - Computational Statistics Handbook with MATLAB

P. 181

168 Computational Statistics Handbook with MATLAB

Projeojec
ion
ionPursuiPursui
Pr PPrr ojeoje ct cctt tionPursuiionPursuit t tt
The Andrews curves and parallel coordinate plots are attempts to visualize
all of the data points and all of the dimensions at once. An Andrews curve
accomplishes this by mapping a data point to a curve. Parallel coordinate dis-
plays accomplish this by mapping each observation to a polygonal line with
vertices on parallel axes. Another option is to tackle the problem of visualiz-
ing multi-dimensional data by reducing the data to a smaller dimension via
a suitable projection. These methods reduce the data to 1-D or 2-D by project-
ing onto a line or a plane and then displaying each point in some suitable
graphic, such as a scatterplot. Once the data are reduced to something that
can be easily viewed, then exploring the data for patterns or interesting struc-
ture is possible.
One well-known method for reducing dimensionality is principal compo-
nent analysis (PCA) [Jackson, 1991]. This method uses the eigenvector
decomposition of the covariance (or the correlation) matrix. The data are then
projected onto the eigenvector corresponding to the maximum eigenvalue
(sometimes known as the first principal component) to reduce the data to one
dimension. In this case, the eigenvector is one that follows the direction of the
maximum variation in the data. Therefore, if we project onto the first princi-
pal component, then we will be using the direction that accounts for the max-
imum amount of variation using only one dimension. We illustrate the notion
of projecting data onto a line in Figure 5.43.
We could project onto two dimensions using the eigenvectors correspond-
ing to the largest and second largest eigenvalues. This would project onto the
plane spanned by these eigenvectors. As we see shortly, PCA can be thought
of in terms of projection pursuit, where the interesting structure is the vari-
ance of the projected data.
There are an infinite number of planes that we can use to reduce the dimen-
sionality of our data. As we just mentioned, the first two principal compo-
nents in PCA span one such plane, providing a projection such that the
variation in the projected data is maximized over all possible 2-D projections.
However, this might not be the best plane for highlighting interesting and
informative structure in the data. Structure is defined to be departure from
normality and includes such things as clusters, linear structures, holes, outli-
ers, etc. Thus, the objective is to find a projection plane that provides a 2-D
view of our data such that the structure (or departure from normality) is max-
imized over all possible 2-D projections.
We can use the Central Limit Theorem to motivate why we are interested
in departures from normality. Linear combinations of data (even Bernoulli
data) look normal. Since in most of the low-dimensional projections, one
observes a Gaussian, if there is something interesting (e.g., clusters, etc.), then
it has to be in the few non-normal projections.
Freidman and Tukey [1974] describe projection pursuit as a way of search-
ing for and exploring nonlinear structure in multi-dimensional data by exam-
ining many 2-D projections. The idea is that 2-D orthogonal projections of the

176 177 178 179 180 181 182 183 184 185 186