Page 125 - Computational Statistics Handbook with MATLAB
P. 125

112                        Computational Statistics Handbook with MATLAB


                                   in the data set, then robust  statistical methods might be  more
                                   appropriate. In Chapter 10, we illustrate an example where a graph-
                                   ical look at the data indicates the presence of outliers, so we use a
                                   robust method of nonparametric regression.
                                • We have a random sample that will be used to develop a model.
                                   This model will be included in our simulation of a process (e.g.,
                                   simulating a physical process such as a queue). We can use EDA
                                   techniques to help us determine how the data might be distributed
                                   and what model might be appropriate.

                              In this chapter, we will be discussing graphical EDA and how these tech-
                             niques can be used to gain information and insights about the data. Some
                             experts include techniques such as smoothing, probability density estima-
                             tion, clustering and principal component analysis in exploratory data analy-
                             sis. We agree that these can be part of EDA, but we do not cover them in this
                             chapter. Smoothing techniques are discussed in Chapter 10 where we present
                             methods for nonparametric regression. Techniques for probability density
                             estimation are presented in Chapter 8, but we do discuss simple histograms
                             in this chapter. Methods for clustering are described in Chapter 9. Principal
                             component analysis is not covered in this book, because the subject is dis-
                             cussed in many linear algebra texts [Strang, 1988; Jackson, 1991].
                              It is likely that some of the visualization methods in this chapter are famil-
                             iar to statisticians, data analysts and engineers. As we stated in Chapter 1,
                             one of the goals of this book is to promote the use of MATLAB for statistical
                             analysis. Some readers might not be familiar with the extensive graphics
                             capabilities of MATLAB, so we endeavor to describe the most useful ones for
                             data analysis. In Section 5.2, we consider techniques for visualizing univari-
                             ate data. These include such methods as stem-and-leaf plots, box plots, histo-
                             grams, and quantile plots. We turn our attention to techniques for visualizing
                             bivariate data in Section 5.3 and include a description of surface plots, scat-
                             terplots and bivariate histograms. Section 5.4 offers several methods for
                             viewing multi-dimensional data, such as slices, isosurfaces, star plots, paral-
                             lel coordinates, Andrews curves, projection pursuit, and the grand tour.






                             5.2 Exploring Univariate Data

                             Two important goals of EDA are: 1) to determine a reasonable model for the
                             process that generated the data, and 2) to locate possible outliers in the sam-
                             ple. For example, we might be interested in finding out whether the distribu-
                             tion that generated the data is symmetric or skewed. We might also like to
                             know whether it has one mode or many modes. The univariate visualization
                             techniques presented here will help us answer questions such as these.



                            © 2002 by Chapman & Hall/CRC
   120   121   122   123   124   125   126   127   128   129   130