Page 233 - Applied Probability
P. 233
reversible nucleotide model into a reversible codon model. Suppose the
equilibrium distribution of the nucleotide model is given by π b for each
nucleotide b. Under reversibility, π b ω bd = π d ω db . If the acceptance proba-
bilities are symmetric, then we claim that the equilibrium probability of
the codon (a, b, c)is π a π b π c up to a multiplicative constant. The proof of
this statement is simply the equality 10. Molecular Phylogeny 219
π a π b π c ω bdρ = π a π d π c ω db ρ,
where ρ = ρ (a,b,c)→(a,d,c) = ρ (a,d,c)→(a,b,c) . Because stop codons are omit-
ted, we must normalize our proposed equilibrium distribution by divid-
ing its entries by the sum π a π b π c taken over all non-stop codons
(a,b,c)
(a, b, c).
In practice it is advantageous to group the twenty amino acids into
penalty sets having roughly similar charge properties [19]. The most nat-
ural division according to charge consists of four groups: the non-polar
amino acids S 1 = {G, I, V, L, A, M, P, F, W}, the positive-polar/positively
charged amino acids S 2 = {Q, N, Y, H, K, R}, the negative-polar/negatively
charged amino acids S 3 = {S, T, E, D}, and the single amino acid cysteine
S 4 = {C}. Cysteine is put into a group by itself because of its propensity
to form disulfide bonds bridging different parts of a protein. In the sets
S 1 through S 4 , we use the amino acid abbreviations listed in Table A.1 of
Appendix A. To achieve a parsimonious parameterization of the acceptance
probabilities, we distinguish the acceptance probability ρ 0 within a group,
the acceptance probability ρ 1 between a polar and a nonpolar group, the ac-
ceptance probability ρ 2 between different polar groups, and the acceptance
probability ρ 3 involving substitution of a cysteine. One would anticipate
that ρ 0 >ρ 1 >ρ 2 and ρ 0 >ρ 3 . The one case where we might expect
the symmetry condition to fail involves ρ 3 . Substitution of another amino
acid for a cysteine involved in a disulfide bond is bound to be much less
likely than the reserve substitution. However, it would take an enormous
amount of data to see this effect, and it is mathematically advantageous to
maintain symmetry for the sake of reversibility.
10.9 Variation in the Rate of Evolution
Some amino acids of a protein are so crucial to function and structure that
they strongly resist substitution. Because of the division of a protein into
functional and structural domains, these resistant codon sites tend to be
clumped. Cross taxa comparisons can help identify the resistant sites and
the level of spatial correlation. The key is to use the theoretical machinery
of Gibbs random fields [2, 12, 24]. For the sake of argument, suppose that
we classify codon sites as fast or slow evolvers using an indicator random
variable C i that equals 1 when codon site i is slow evolving and equals 0