Page 198 - Computational Statistics Handbook with MATLAB
P. 198
Chapter 5: Exploratory Data Analysis 185
any visualization system. It looks at the rules for producing pie charts, bar
charts scatterplots, maps, function plots, and many others.
For the reader who is interested in visualization and information design,
the three books by Edward Tufte are recommended. His first book, The Visual
Display of Quantitative Information [Tufte, 1983], shows how to depict num-
bers. The second in the series is called Envisioning Information [Tufte, 1990],
and illustrates how to deal with pictures of nouns (e.g., maps, aerial photo-
graphs, weather data). The third book is entitled Visual Explanations [Tufte,
1997], and it discusses how to illustrate pictures of verbs. These three books
also provide many examples of good graphics and bad graphics. We highly
recommend the book by Wainer [1997] for any statistician, engineer or data
analyst. Wainer discusses the subject of good and bad graphics in a way that
is accessible to the general reader.
Other techniques for visualizing multi-dimensional data have been pro-
posed in the literature. One method introduced by Chernoff [1973] represents
d-dimensional observations by a cartoon face, where features of the face
reflect the values of the measurements. The size and shape of the nose, eyes,
mouth, outline of the face and eyebrows, etc. would be determined by the
value of the measurements. Chernoff faces can be used to determine simple
trends in the data, but they are hard to interpret in most cases.
Another graphical EDA method that is often used is called brushing.
Brushing [Venables and Ripley, 1994; Cleveland, 1993] is an interactive tech-
nique where the user can highlight data points on a scatterplot and the same
points are highlighted on all other plots. For example, in a scatterplot matrix,
highlighting a point in one plot shows up as highlighted in all of the others.
This helps illustrate interesting structure across plots.
High-dimensional data can also be viewed using color histograms or data
images. Color histograms are described in Wegman [1990]. Data images are
discussed in Minotte and West [1998] and are a special case of color histo-
grams.
For more information on the graphical capabilities of MATLAB, we refer
the reader to the MATLAB documentation Using MATLAB Graphics. Another
excellent resource is the book called Graphics and GUI’s with MATLAB by
Marchand [1999]. These go into more detail on the graphics capabilities in
MATLAB that are useful in data analysis such as lighting, use of the camera,
animation, etc.
We now describe references that extend the techniques given in this book.
• Stem-and-leaf: Various versions and extensions of the stem-and-
leaf plot are available. We show an ordered stem-and-leaf plot in
this book, but ordering is not required. Another version shades the
leaves. Most introductory applied statistics books have information
on stem-and-leaf plots (e.g., Montgomery, et al. [1998]). Hunter
[1988] proposes an enhanced stem-and-leaf called the digidot plot.
This combines a stem-and-leaf with a time sequence plot. As data
© 2002 by Chapman & Hall/CRC