Page 225 - Applied Probability
P. 225
10. Molecular Phylogeny
211
10.5 A Nucleotide Substitution Model
Models for nucleotide substitution are of great importance in molecular
evolution. Kimura [13], among others, views the changes occurring at a
single position or site as a continuous-time Markov chain involving the four
bases A, G, C, and T. The matrix Λ below gives the transition rates under
a generalization of Kimura’s model of neutral evolution. In this matrix the
rows and columns are labeled by the four states in the order A, G, C, and
T from top to bottom and left to right.
A G C T
A −(α + γ + λ) α γ λ
G −( + γ + λ) γ à
Λ= .
C δ κ −(δ + κ + β) β
T δ κ σ −(δ + κ + σ)
Without further restrictions, this chain does not satisfy detailed balance.
If we impose the additional constraints βγ = λσ and αδ = κ, then the
distribution
δ
=
π A
γ + δ + κ + λ
κ
=
π G
γ + δ + κ + λ
γ
π C = (10.11)
γ + δ + κ + λ
λ
=
π T
γ + δ + κ + λ
satisfies detailed balance. To verify detailed balance, one must check six
equalities of the type (10.10). For instance, π A α = π G follows directly
from the definitions of π A and π G and the constraint αδ = κ. Kolmogorov’s
criterion indicates that the two stated constraints are necessary as well as
sufficient for detailed balance.
In the Markov chain, two purines or two pyrimidines are said to differ by
a transition. (This convention of the evolutionary biologists is confusing.
All states differ by what a probabilist would call a single transition of the
chain. However, we will defer to the biologists on this point.) A purine and
a pyrimidine are said to differ by a transversion. The matrix Λ displays a
modest amount of symmetry in the sense that the two transversions leading
to any given state always share the same transition rate.
In principle, it is possible to solve for the finite-time transition matrix
P(t) in this model by exponentiating the infinitesimal generator Λ. To avoid
this rather cumbersome calculation, we generalize the arguments of Kimura
[13] and exploit the symmetry inherent in the model. Define q AY (t)to be
the probability that the chain is in either of the two pyrimidines C or T