Page 66 - Applied Probability
P. 66

3. Newton’s Method and Scoring
                              deduce this fact by applying the moment formula (3.13) in the conditional
                              density computation
                                                      Γ(α . )
                                                               2n

                                                                         i

                                                     k
                                                     i=1
                                                                      k
                                                   Γ(α . )  Γ(α i ) 2n n 1 ...n k     k i=1  p n i +α i −1  49
                                                                          n i +α i −1
                                                                 "


                                                   k                  i=1 i
                                                                         q       dq
                                                     Γ(α i )  n 1 ...n k  ∆ k
                                                   i=1
                                                                k
                                                   Γ(2n + α . )     n i +α i −1
                                             =                    p       .
                                                  k  Γ(n i + α i )  i
                                                  i=1          i=1
                              The posterior mean (n i + α i )/(2n + α . ) is a strongly consistent, asymptot-
                              ically unbiased estimator of p i .
                                The primary drawback of being Bayesian in this situation is that there
                              is no obvious way of selecting a reasonable prior. However, if data from
                              several distinct populations are available, then one can select an appropriate
                              prior empirically. Consider the marginal distribution of the allele counts
                                          t
                              (N 1 ,...,N k ) in a sample of genes from a single population. Integrating
                                                                                     t
                              out the prior on the allele frequency vector p =(p 1 ,...,p k ) yields the
                              predictive distribution [16]
                                                Pr(N 1 = n 1 ,...,N k = n k )
                                                                       k

                                                     2n        Γ(α . )     Γ(n i + α i )
                                            =                                      .      (3.14)
                                                  n 1 ··· n k  Γ(2n + α . )  Γ(α i )
                                                                      i=1
                              This distribution is known as the Dirichlet-multinomial distribution.
                              Its parameters are the α’s rather than the p’s.
                                With independent data from several distinct populations, one can esti-
                                                                     t
                              mate the parameter vector α =(α 1 ,...,α k ) of the Dirichlet-multinomial
                              distribution by maximum likelihood. The estimated α can then be recycled
                              to compute the posterior means of the allele frequencies for the separate
                              populations. This interplay between frequentist and Bayesian techniques is
                              typical of the empirical Bayes method.
                                To estimate the parameter vector α characterizing the prior, we again
                              revert to Newton’s method. We need the score dL(α) and the observed
                                            2
                              information −d L(α) for each population. Based on the likelihood (3.14),
                              elementary calculus shows that the score has entries
                                       ∂
                                          L(α)  = ψ(α . ) − ψ(2n + α . )+ ψ(n i + α i ) − ψ(α i ),  (3.15)
                                      ∂α i
                              where ψ(s)=   d  ln Γ(s) is the digamma function [9]. The observed infor-
                                           ds
                              mation has entries
                                             ∂ 2


                                         −       L(α)  = −ψ (α . )+ ψ (2n + α . )         (3.16)
                                          ∂α i ∂α j


                                                           −1 {i=j} [ψ (n i + α i ) − ψ (α i )],
   61   62   63   64   65   66   67   68   69   70   71