Page 132 - Computational Statistics Handbook with MATLAB

P. 132

Chapter 5: Exploratory Data Analysis 119

one could plot a stem-and-leaf with one and with two lines per stem as a way
of discovering more about the data. The stem-and-leaf is useful in that it
approximates the shape of the density, and it also provides a listing of the
data. One can usually recover the original data set from the stem-and-leaf (if
it has not been rounded), unlike the histogram. A disadvantage of the stem-
and-leaf plot is that it is not useful for large data sets, while a histogram is
very effective in reducing and displaying massive data sets.

ile-Basile-Bas
aanntt
a
edPlotsdPlots
ributionribution
ContinuousContinuous D
d
QuQu
Qu
s
Qu annt tile-Basile-Base eedPlotsPlots - Continuous -- - Continuous Di DDii isst sstt tributionribution s ss
If we need to compare two distributions, then we can use the quantile plot to
visually compare them. This is also applicable when we want to compare a
distribution and a sample or to compare two samples. In comparing the dis-
tributions or samples, we are interested in knowing how they are shifted rel-
ative to each other. In essence, we want to know if they are distributed in the
same way. This is important when we are trying to determine the distribution
that generated our data, possibly with the goal of using that information to
generate data for Monte Carlo simulation. Another application where this is
useful is in checking model assumptions, such as normality, before we con-
duct our analysis.
In this part, we discuss several versions of quantile-based plots. These
include quantile-quantile plots (q-q plots) and quantile plots (sometimes
called a probability plot). Quantile plots for discrete data are discussed next.
The quantile plot is used to compare a sample with a theoretical distribution.
Typically, a q-q plot (sometimes called an empirical quantile plot) is used to
determine whether two random samples are generated by the same distribu-
tion. It should be noted that the q-q plot can also be used to compare a ran-
dom sample with a theoretical distribution by generating a sample from the
theoretical distribution as the second sample.

t
Plo
Q-Q-
Q-Q QQ PloPlo t tt
Q-
QPlo
The q-q plot was originally proposed by Wilk and Gnanadesikan [1968] to
visually compare two distributions by graphing the quantiles of one versus
the quantiles of the other. Say we have two data sets consisting of univariate
measurements. We denote the order statistics for the first data set by
, , ,
x 1() x 2() … x n() .
Let the order statistics for the second data set be

,
,
,
y 1() y 2() … y m() ,
with m ≤ n .

127 128 129 130 131 132 133 134 135 136 137