Page 181 - Computational Statistics Handbook with MATLAB
P. 181

168                        Computational Statistics Handbook with MATLAB



                             Projeojec
                                   ion
                                   ionPursuiPursui
                             Pr PPrr ojeoje  ct cctt tionPursuiionPursuit  t tt
                             The Andrews curves and parallel coordinate plots are attempts to visualize
                             all of the data points and all of the dimensions at once. An Andrews curve
                             accomplishes this by mapping a data point to a curve. Parallel coordinate dis-
                             plays accomplish this by mapping each observation to a polygonal line with
                             vertices on parallel axes. Another option is to tackle the problem of visualiz-
                             ing multi-dimensional data by reducing the data to a smaller dimension via
                             a suitable projection. These methods reduce the data to 1-D or 2-D by project-
                             ing onto a line or a plane and then displaying each point in some suitable
                             graphic, such as a scatterplot. Once the data are reduced to something that
                             can be easily viewed, then exploring the data for patterns or interesting struc-
                             ture is possible.
                              One well-known method for reducing dimensionality is principal compo-
                             nent analysis (PCA) [Jackson, 1991]. This method uses the eigenvector
                             decomposition of the covariance (or the correlation) matrix. The data are then
                             projected onto the eigenvector corresponding to the maximum eigenvalue
                             (sometimes known as the first principal component) to reduce the data to one
                             dimension. In this case, the eigenvector is one that follows the direction of the
                             maximum variation in the data. Therefore, if we project onto the first princi-
                             pal component, then we will be using the direction that accounts for the max-
                             imum amount of variation using only one dimension. We illustrate the notion
                             of projecting data onto a line in Figure 5.43.
                              We could project onto two dimensions using the eigenvectors correspond-
                             ing to the largest and second largest eigenvalues. This would project onto the
                             plane spanned by these eigenvectors. As we see shortly, PCA can be thought
                             of in terms of projection pursuit, where the interesting structure is the vari-
                             ance of the projected data.
                              There are an infinite number of planes that we can use to reduce the dimen-
                             sionality of our data. As we just mentioned, the first two principal compo-
                             nents in PCA span one such plane, providing a projection such that the
                             variation in the projected data is maximized over all possible 2-D projections.
                             However, this might not be the best plane for highlighting interesting and
                             informative structure in the data. Structure is defined to be departure from
                             normality and includes such things as clusters, linear structures, holes, outli-
                             ers, etc. Thus, the objective is to find a projection plane that provides a 2-D
                             view of our data such that the structure (or departure from normality) is max-
                             imized over all possible 2-D projections.
                              We can use the Central Limit Theorem to motivate why we are interested
                             in departures from normality. Linear combinations of data (even Bernoulli
                             data) look normal. Since in most of the low-dimensional projections, one
                             observes a Gaussian, if there is something interesting (e.g., clusters, etc.), then
                             it has to be in the few non-normal projections.
                              Freidman and Tukey [1974] describe projection pursuit as a way of search-
                             ing for and exploring nonlinear structure in multi-dimensional data by exam-
                             ining many 2-D projections. The idea is that 2-D orthogonal projections of the


                            © 2002 by Chapman & Hall/CRC
   176   177   178   179   180   181   182   183   184   185   186