Page 139 - Computational Statistics Handbook with MATLAB
P. 139
126 Computational Statistics Handbook with MATLAB
a
aanntt
DiscreteDistDist
ributionibution
ilePlotsPlots
Discrete
ile
ile
Qu annt tilePlotsPlots - - -- DiscreteDistDiscreteDistr rr ibutionibution s ss
s
QuQu
Qu
Previously, we discussed quantile plots that are primarily used for continu-
ous data. We would like to have a similar technique for graphically compar-
ing the shapes of discrete distributions. Hoaglin and Tukey [1985] developed
several plots to accomplish this. We present two of them here: the Poisson-
ness plot and the binomialness plot. These will enable us to search for evi-
dence that our discrete data follow a Poisson or a binomial distribution. They
also serve to highlight which points might be incompatible with the model.
Po
o
sP
sP
ll
oi
sonne
s
s
P o is ii s ssonne sonne s s ss sP lo ot o t tt
sonne
P
l
P
o
P
Typically, discrete data are whole number values that are often obtained by
counting the number of times something occurs. For example, these might be
the number of traffic fatalities, the number of school-age children in a house-
hold, the number of defects on a hard drive, or the number of errors in a com-
puter program. We sometimes have the data in the form of a frequency
distribution that lists the possible count values (e.g., 0.1 2 …,, ) and the num-
ber of observations that are equal to the count values.
,
,,
The counts will be denoted as k, with k = 01 … L . We will assume that
L is the maximum observed value for our discrete variable or counts in the
data set and that we are interested in all counts between 0 and L. Thus, the
total number of observations in the sample is
L
N = ∑ n k ,
k = 0
represents the number of observations that are equal to the count k.
where n k
A basic Poissonness plot is constructed by plotting the count values k on
the horizontal axis and
(
ϕ n k ) = ln ( k!n k N⁄ ) (5.2)
on the vertical axis. These are plotted as symbols, similar to the quantile plot.
If a Poisson distribution is a reasonable model for the data, then this should
follow a straight line. Systematic curvature in the plot would indicate that
(
these data are not consistent with a Poisson distribution. The values for ϕ n k )
is small, so Hoaglin and Tukey [1985]
tend to have more variability when n k
suggest plotting a special symbol or a ‘1’ to highlight these points.
Example 5.7
This example is taken from Hoaglin and Tukey [1985]. In the late 1700’s, Alex-
ander Hamilton, John Jay and James Madison wrote a series of 77 essays
under the title of The Federalist. These appeared in the newspapers under a
© 2002 by Chapman & Hall/CRC