Page 124 - Computational Statistics Handbook with MATLAB
P. 124
Chapter 5
Exploratory Data Analysis
5.1 Introduction
Exploratory data analysis (EDA) is quantitative detective work according to
John Tukey [1977]. EDA is the philosophy that data should first be explored
without assumptions about probabilistic models, error distributions, number
of groups, relationships between the variables, etc. for the purpose of discov-
ering what they can tell us about the phenomena we are investigating. The
goal of EDA is to explore the data to reveal patterns and features that will
help the analyst better understand, analyze and model the data. With the
advent of powerful desktop computers and high resolution graphics capabil-
ities, these methods and techniques are within the reach of every statistician,
engineer and data analyst.
EDA is a collection of techniques for revealing information about the data
and methods for visualizing them to see what they can tell us about the
underlying process that generated it. In most situations, exploratory data
analysis should precede confirmatory analysis (e.g., hypothesis testing,
ANOVA, etc.) to ensure that the analysis is appropriate for the data set. Some
examples and goals of EDA are given below to help motivate the reader.
• If we have a time series, then we would plot the values over time
to look for patterns such as trends, seasonal effects or change
points. In Chapter 11, we have an example of a time series that
shows evidence of a change point in a Poisson process.
• We have observations that relate two characteristics or variables,
and we are interested in how they are related. Is there a linear or
a nonlinear relationship? Are there patterns that can provide
insight into the process that relates the variables? We will see exam-
ples of this application in Chapters 7 and 10.
• We need to provide some summary statistics that describe the data
set. We should look for outliers or aberrant observations that might
contaminate the results. If EDA indicates extreme observations are
© 2002 by Chapman & Hall/CRC