Page 225 - Applied Probability
P. 225

10. Molecular Phylogeny
                                                                                            211
                              10.5 A Nucleotide Substitution Model
                              Models for nucleotide substitution are of great importance in molecular
                              evolution. Kimura [13], among others, views the changes occurring at a
                              single position or site as a continuous-time Markov chain involving the four
                              bases A, G, C, and T. The matrix Λ below gives the transition rates under
                              a generalization of Kimura’s model of neutral evolution. In this matrix the
                              rows and columns are labeled by the four states in the order A, G, C, and
                              T from top to bottom and left to right.
                                              A            G            C            T
                                                                                           
                                     A   −(α + γ + λ)      α            γ             λ
                                     G               −(  + γ + λ)      γ             à     
                               Λ=                                                           .
                                     C       δ            κ       −(δ + κ + β)       β     
                                     T        δ            κ            σ        −(δ + κ + σ)
                                Without further restrictions, this chain does not satisfy detailed balance.
                              If we impose the additional constraints βγ = λσ and αδ =  κ, then the
                              distribution
                                                                  δ
                                                         =
                                                     π A
                                                             γ + δ + κ + λ
                                                                  κ
                                                         =
                                                     π G
                                                             γ + δ + κ + λ
                                                                  γ
                                                     π C  =                              (10.11)
                                                             γ + δ + κ + λ
                                                                  λ
                                                         =
                                                     π T
                                                             γ + δ + κ + λ
                              satisfies detailed balance. To verify detailed balance, one must check six
                              equalities of the type (10.10). For instance, π A α = π G   follows directly
                              from the definitions of π A and π G and the constraint αδ =  κ. Kolmogorov’s
                              criterion indicates that the two stated constraints are necessary as well as
                              sufficient for detailed balance.
                                In the Markov chain, two purines or two pyrimidines are said to differ by
                              a transition. (This convention of the evolutionary biologists is confusing.
                              All states differ by what a probabilist would call a single transition of the
                              chain. However, we will defer to the biologists on this point.) A purine and
                              a pyrimidine are said to differ by a transversion. The matrix Λ displays a
                              modest amount of symmetry in the sense that the two transversions leading
                              to any given state always share the same transition rate.
                                In principle, it is possible to solve for the finite-time transition matrix
                              P(t) in this model by exponentiating the infinitesimal generator Λ. To avoid
                              this rather cumbersome calculation, we generalize the arguments of Kimura
                              [13] and exploit the symmetry inherent in the model. Define q AY (t)to be
                              the probability that the chain is in either of the two pyrimidines C or T
   220   221   222   223   224   225   226   227   228   229   230