Page 139 - Computational Statistics Handbook with MATLAB
P. 139

126                        Computational Statistics Handbook with MATLAB



                             a
                               aanntt
                                            DiscreteDistDist
                                                       ributionibution
                                  ilePlotsPlots
                                            Discrete
                                  ile
                                  ile
                             Qu  annt  tilePlotsPlots -  - --  DiscreteDistDiscreteDistr  rr ibutionibution  s ss
                                                        s
                             QuQu
                             Qu
                             Previously, we discussed quantile plots that are primarily used for continu-
                             ous data. We would like to have a similar technique for graphically compar-
                             ing the shapes of discrete distributions. Hoaglin and Tukey [1985] developed
                             several plots to accomplish this. We present two of them here: the Poisson-
                             ness plot and the binomialness plot. These will enable us to search for evi-
                             dence that our discrete data follow a Poisson or a binomial distribution. They
                             also serve to highlight which points might be incompatible with the model.
                             Po
                              o
                                      sP
                                      sP
                                         ll
                              oi
                                sonne
                                s
                                s
                             P o is ii  s ssonne sonne  s s ss sP  lo ot o t tt
                                sonne
                             P
                                      l
                                      P
                                          o
                             P
                             Typically, discrete data are whole number values that are often obtained by
                             counting the number of times something occurs. For example, these might be
                             the number of traffic fatalities, the number of school-age children in a house-
                             hold, the number of defects on a hard drive, or the number of errors in a com-
                             puter program. We sometimes have the data in the form of a frequency
                             distribution that lists the possible count values (e.g., 0.1 2 …,,  ) and the num-
                             ber of observations that are equal to the count values.
                                                                          ,
                                                                     ,,
                              The counts will be denoted as k, with k =  01 … L  . We will assume that
                             L is the maximum observed value for our discrete variable or counts in the
                             data set and that we are interested in all counts between 0 and L. Thus, the
                             total number of observations in the sample is
                                                               L
                                                         N =  ∑  n k  ,
                                                              k =  0
                                      represents the number of observations that are equal to the count k.
                             where n k
                              A basic Poissonness plot is constructed by plotting the count values k on
                             the horizontal axis and
                                                       (
                                                     ϕ n k ) =  ln ( k!n k N⁄  )            (5.2)
                             on the vertical axis. These are plotted as symbols, similar to the quantile plot.
                             If a Poisson distribution is a reasonable model for the data, then this should
                             follow a straight line. Systematic curvature in the plot would indicate that
                                                                                            (
                             these data are not consistent with a Poisson distribution. The values for ϕ n k )
                                                                is small, so Hoaglin and Tukey [1985]
                             tend to have more variability when n k
                             suggest plotting a special symbol or a ‘1’ to highlight these points.
                             Example 5.7
                             This example is taken from Hoaglin and Tukey [1985]. In the late 1700’s, Alex-
                             ander Hamilton, John Jay and James Madison wrote a series of 77 essays
                             under the title of The Federalist. These appeared in the newspapers under a
                            © 2002 by Chapman & Hall/CRC
   134   135   136   137   138   139   140   141   142   143   144