Page 233 - Applied Probability
P. 233

reversible nucleotide model into a reversible codon model. Suppose the
                              equilibrium distribution of the nucleotide model is given by π b for each
                              nucleotide b. Under reversibility, π b ω bd = π d ω db . If the acceptance proba-
                              bilities are symmetric, then we claim that the equilibrium probability of
                              the codon (a, b, c)is π a π b π c up to a multiplicative constant. The proof of
                              this statement is simply the equality  10. Molecular Phylogeny  219
                                                  π a π b π c ω bdρ = π a π d π c ω db ρ,
                              where ρ = ρ (a,b,c)→(a,d,c) = ρ (a,d,c)→(a,b,c) . Because stop codons are omit-
                              ted, we must normalize our proposed equilibrium distribution by divid-

                              ing its entries by the sum     π a π b π c taken over all non-stop codons
                                                        (a,b,c)
                              (a, b, c).
                                In practice it is advantageous to group the twenty amino acids into
                              penalty sets having roughly similar charge properties [19]. The most nat-
                              ural division according to charge consists of four groups: the non-polar
                              amino acids S 1 = {G, I, V, L, A, M, P, F, W}, the positive-polar/positively
                              charged amino acids S 2 = {Q, N, Y, H, K, R}, the negative-polar/negatively
                              charged amino acids S 3 = {S, T, E, D}, and the single amino acid cysteine
                              S 4 = {C}. Cysteine is put into a group by itself because of its propensity
                              to form disulfide bonds bridging different parts of a protein. In the sets
                              S 1 through S 4 , we use the amino acid abbreviations listed in Table A.1 of
                              Appendix A. To achieve a parsimonious parameterization of the acceptance
                              probabilities, we distinguish the acceptance probability ρ 0 within a group,
                              the acceptance probability ρ 1 between a polar and a nonpolar group, the ac-
                              ceptance probability ρ 2 between different polar groups, and the acceptance
                              probability ρ 3 involving substitution of a cysteine. One would anticipate
                              that ρ 0 >ρ 1 >ρ 2 and ρ 0 >ρ 3 . The one case where we might expect
                              the symmetry condition to fail involves ρ 3 . Substitution of another amino
                              acid for a cysteine involved in a disulfide bond is bound to be much less
                              likely than the reserve substitution. However, it would take an enormous
                              amount of data to see this effect, and it is mathematically advantageous to
                              maintain symmetry for the sake of reversibility.


                              10.9 Variation in the Rate of Evolution


                              Some amino acids of a protein are so crucial to function and structure that
                              they strongly resist substitution. Because of the division of a protein into
                              functional and structural domains, these resistant codon sites tend to be
                              clumped. Cross taxa comparisons can help identify the resistant sites and
                              the level of spatial correlation. The key is to use the theoretical machinery
                              of Gibbs random fields [2, 12, 24]. For the sake of argument, suppose that
                              we classify codon sites as fast or slow evolvers using an indicator random
                              variable C i that equals 1 when codon site i is slow evolving and equals 0
   228   229   230   231   232   233   234   235   236   237   238