Page 184 - Computational Statistics Handbook with MATLAB
P. 184

Chapter 5: Exploratory Data Analysis                            171


                                    is a box in the projection plane.
                                B k
                                I   is the indicator function for region  B  .
                                 B k                                k
                                                 ,
                                       ⁄
                                                    ,
                                η j =  πj 36  , j =  0 … 8   is the angle by which the data are rotated in
                                                                            .
                                   the plane before being assigned to regions  B k
                                αη()  and β η()   are given by
                                    j
                                              j
                                                     ()
                                                    αη j =  αcos η j –  βsin  η j
                                                                                           (5.14)
                                                     ()
                                                    βη j =  αsin η j +  βcos  η j
                                c  is a scalar  that determines the  size of the  neighborhood around
                                   ( α β,  * )   that is visited in the search for planes that provide better
                                     *
                                   values for the projection pursuit index.
                                v is a vector uniformly distributed on the unit d-dimensional sphere.
                                half specifies the number of steps without an increase in the projection
                                   index, at which time the value of the neighborhood is halved.
                                m represents the number of searches or random starts to find the best
                                   plane.



                                             e
                                                exx
                                             IndInd
                                   ion
                                   ionPursuitPursuit
                             Pr
                             PPrr ojeoje
                                 ct
                             Projeojec  cctt tionPursuitionPursuit  Ind  Ind  eexx
                             Posse [1995a, 1995b] developed an index based on the chi-square. The plane
                                                                   that are distributed in rings. See
                             is first divided into 48 regions or boxes  B k
                             Figure 5.44 for an illustration of how the plane is partitioned. All regions have
                             the same angular width of 45 degrees and the inner regions have the same
                                                   ⁄
                             radial width of  2log(  6) 12  5 ⁄  . This choice for the radial width provides
                             regions with approximately the same probability for the standard bivariate
                             normal distribution. The regions in the outer ring have probability 148⁄  . The
                             regions are constructed in this way to account for the radial symmetry of the
                             bivariate normal distribution.
                              Posse [1995a, 1995b] provides the population version of the projection
                             index. We present only the empirical version here, because that is the one that
                             must be implemented on the computer. The projection index is given by
                                                      8  48     n                 2
                                                                           (
                                                                      (
                                                           1 1
                                         PI 2 αβ,(  ) =  1 - ∑  ∑  ---- --- ∑ I ( z αη ) j  z ,  βη ) j  ) –  c  .  (5.15)
                                                    --
                                           χ        9      c k n  B k  i  i      k
                                                     j =  1  k =  1  i =  1
                             The chi-square projection index is not affected by the presence of outliers.
                             This means that an interesting projection obtained using this index will not
                             be one that is interesting solely because of outliers, unlike some of the other
                             indexes (see Appendix C). It is sensitive to distributions that have a hole in
                             the core, and it will also yield projections that contain clusters. The chi-square
                             projection pursuit index is fast and easy to compute, making it appropriate

                            © 2002 by Chapman & Hall/CRC
   179   180   181   182   183   184   185   186   187   188   189