Page 198 - Computational Statistics Handbook with MATLAB
P. 198

Chapter 5: Exploratory Data Analysis                            185


                             any visualization system. It looks at the rules for producing pie charts, bar
                             charts scatterplots, maps, function plots, and many others.
                              For the reader who is interested in visualization and information design,
                             the three books by Edward Tufte are recommended. His first book, The Visual
                             Display of Quantitative Information [Tufte, 1983], shows how to depict num-
                             bers. The second in the series is called Envisioning Information [Tufte, 1990],
                             and illustrates how to deal with pictures of nouns (e.g., maps, aerial photo-
                             graphs, weather data). The third book is entitled Visual Explanations [Tufte,
                             1997], and it discusses how to illustrate pictures of verbs. These three books
                             also provide many examples of good graphics and bad graphics. We highly
                             recommend the book by Wainer [1997] for any statistician, engineer or data
                             analyst. Wainer discusses the subject of good and bad graphics in a way that
                             is accessible to the general reader.
                              Other techniques for visualizing multi-dimensional data have been pro-
                             posed in the literature. One method introduced by Chernoff [1973] represents
                             d-dimensional observations by a cartoon face, where features of the face
                             reflect the values of the measurements. The size and shape of the nose, eyes,
                             mouth, outline of the face and eyebrows, etc. would be determined by the
                             value of the measurements. Chernoff faces can be used to determine simple
                             trends in the data, but they are hard to interpret in most cases.
                              Another graphical EDA method that is often used is called brushing.
                             Brushing [Venables and Ripley, 1994; Cleveland, 1993] is an interactive tech-
                             nique where the user can highlight data points on a scatterplot and the same
                             points are highlighted on all other plots. For example, in a scatterplot matrix,
                             highlighting a point in one plot shows up as highlighted in all of the others.
                             This helps illustrate interesting structure across plots.
                              High-dimensional data can also be viewed using color histograms or data
                             images. Color histograms are described in Wegman [1990]. Data images are
                             discussed in Minotte and West [1998] and are a special case of color histo-
                             grams.
                              For more information on the graphical capabilities of MATLAB, we refer
                             the reader to the MATLAB documentation Using MATLAB Graphics. Another
                             excellent resource is the book called Graphics and GUI’s with MATLAB by
                             Marchand [1999]. These go into more detail on the graphics capabilities in
                             MATLAB that are useful in data analysis such as lighting, use of the camera,
                             animation, etc.
                              We now describe references that extend the techniques given in this book.

                                • Stem-and-leaf: Various versions and extensions of the stem-and-
                                   leaf plot are available. We show an ordered stem-and-leaf plot in
                                   this book, but ordering is not required. Another version shades the
                                   leaves. Most introductory applied statistics books have information
                                   on stem-and-leaf plots (e.g., Montgomery, et al. [1998]). Hunter
                                   [1988] proposes an enhanced stem-and-leaf called the digidot plot.
                                   This combines a stem-and-leaf with a time sequence plot. As data



                            © 2002 by Chapman & Hall/CRC
   193   194   195   196   197   198   199   200   201   202   203