Page 234 - Applied Probability
P. 234

10. Molecular Phylogeny
                              220
                              when codon site i is fast evolving. If the sites are numbered 1 through m,
                              then a Gibbs random field assigns the prior probability
                                                                    e
                                                   Pr(C = c)=
                                                                      H(d)

                                                                     e
                                                                    d H(c)
                              to the random vector C =(C 1 ,...,C m ) using a potential function H(c).
                              One fruitful choice of H(c)is
                                                         m      r    m−j

                                                           c i +        1 {c i =c i+j } .
                                            H(c)= θ 0              θ j
                                                        i=1    j=1   i=1
                              This multilevel logistic model [3] accounts for the proportion of slow evolv-
                              ing codon sites through the parameter θ 0 . Spatial correlation at a distance
                              j ≥ 1 is accounted for by the parameter θ j . The larger θ j is, the stronger
                              the correlation between codon sites i and i + j. Common sense suggests
                              that in most circumstances θ 1 ≥ θ 2 ≥· · · ≥ θ r ≥ 0. The special case r =1
                              is just the Ising model of statistical mechanics [21].
                                To compute the likelihood of the data under the rate variation model, we
                                                                      m
                              follow Schadt and Lange [19], and let L c =    denote the likelihood

                                                                      i=1  L c i
                              of the data given the rate variation vector C = c =(c 1 ,...,c m ). The overall
                              likelihood is then
                                              1    	      H(c)
                                    L =                L ce
                                              e H(d)
                                             d      c
                                                     1     1   m         r m−j
                                              1
                                       =               ···       L c i e θ 0c i  e θ j 1 {c i =c i+j } .
                                              e H(d)
                                             d     c 1 =0  c m =0 i=1   j=1 i=1
                              Computation of either the partition function    d  e H(d)  or the numerator
                                  L ce   of L reduces to the evaluation of a multiple sum of products of
                                     H(c)
                                 c
                              arrays. Once again we compute the multiple sum as an iterated sum. Con-
                              sider the numerator of L. In the forward algorithm, we eliminate the indices
                              c 1 , ··· ,c m in the indicated order. Starting with the array a 0 (c 1 ,...,c r )= 1
                              at step 0, at step i we create the new array a i (c i+1 ,... ,c min{i+r,m} )via
                                       a i (c i+1 ,...,c min{i+r,m} )
                                        1         min{r,m−i}

                                    =      L c i e θ 0 c i  e θ j 1 {c i =c i+j } a i−1 (c i ,...,c min{i−1+r,m} )
                                       c i =0        j=1
                              and discard the old array a i−1 (c i ,...,c min{i−1+r,m} ). The final array a m
                                                                               H(c)
                              has zero dimension and furnishes the numerator  L ce  .
                                                                           c
                                In computing a i (c i+1 ,...,c min{i+r,m} ) it is advantageous to perform the
                              indicated array multiplications pairwise, starting with the two smallest ar-
                                      e θ 0 c i , multiplying the resulting product against e θ 1 1 {c i =c i+1 } , and
                              rays L c i
   229   230   231   232   233   234   235   236   237   238   239