Page 124 - Computational Statistics Handbook with MATLAB
P. 124

Chapter 5




                             Exploratory Data Analysis










                             5.1 Introduction
                             Exploratory data analysis (EDA) is quantitative detective work according to
                             John Tukey [1977]. EDA is the philosophy that data should first be explored
                             without assumptions about probabilistic models, error distributions, number
                             of groups, relationships between the variables, etc. for the purpose of discov-
                             ering what they can tell us about the phenomena we are investigating. The
                             goal of EDA is to explore the data to reveal patterns and features that will
                             help the analyst better understand, analyze and model the data. With the
                             advent of powerful desktop computers and high resolution graphics capabil-
                             ities, these methods and techniques are within the reach of every statistician,
                             engineer and data analyst.
                              EDA is a collection of techniques for revealing information about the data
                             and methods for visualizing them to see what they can tell us about the
                             underlying process that generated it. In most situations, exploratory data
                             analysis should precede confirmatory analysis (e.g., hypothesis testing,
                             ANOVA, etc.) to ensure that the analysis is appropriate for the data set. Some
                             examples and goals of EDA are given below to help motivate the reader.

                                • If we have a time series, then we would plot the values over time
                                   to look  for  patterns such as trends, seasonal effects  or change
                                   points. In  Chapter 11, we have an example of a time series that
                                   shows evidence of a change point in a Poisson process.
                                • We have observations that relate two characteristics or variables,
                                   and we are interested in how they are related. Is there a linear or
                                   a nonlinear  relationship? Are there patterns that can provide
                                   insight into the process that relates the variables? We will see exam-
                                   ples of this application in Chapters 7 and 10.
                                • We need to provide some summary statistics that describe the data
                                   set. We should look for outliers or aberrant observations that might
                                   contaminate the results. If EDA indicates extreme observations are






                            © 2002 by Chapman & Hall/CRC
   119   120   121   122   123   124   125   126   127   128   129