Page 228 - Applied Probability
P. 228

10. Molecular Phylogeny
                              214
                                In summary, we have found the top row [p AA (t),p AG (t),p AC (t),p AT (t)]
                              of P(t) corresponding to the nucleotide A. By symmetrical arguments, the
                              other rows of P(t) can also be calculated. In the limit t →∞, the rows of
                              P(t) all collapse to the equilibrium distribution π.
                              10.6 Maximum Likelihood Reconstruction
                              Maximum likelihood provides a second method of comparing evolutionary
                              trees. As with maximum parsimony, DNA data are gathered at several
                              different sites for several different contemporary taxa. A model is then
                              posed for how differences evolve at the various sites. Most models involve
                              the following assumptions:
                              (a) All sites evolve according to the same tree.
                              (b) All sites evolve independently.

                              (c) All sites evolve according to the same stochastic laws.
                              (d) Conditional on the base at a given site of an internal node, evolution
                                   proceeds independently at the site along the two branches of the tree
                                   descending from the node.

                              Further assumptions about the detailed nature of evolution at a site can be
                              imposed. For instance, we can adopt the generalized Kimura substitution
                              model as just developed.
                                We now discuss how to compute the likelihood of the bases observed
                              at the tips of an evolutionary tree for a particular site. According to as-
                              sumptions (a), (b), and (c), we need merely multiply these site-specific
                              likelihoods to recover the overall likelihood of a given tree. For a tree with
                              n tips, it is convenient to label the internal nodes 1,... ,n − 1 and the tips
                              n,... , 2n − 1. Also, let b i be either one of the four possible bases at an
                              internal node or the observed base at a tip. If the root is node 1, then
                                                                                   . Assumption
                              designate the prior probability of base b 1 at this node by q b 1
                              (d) now provides the likelihood expression

                                                   ···    q b 1  Pr(b j | b i ),         (10.16)
                                                             (i,j)
                                                 b 1  b n−1
                              where (i, j) ranges over all pairs of ancestral nodes i and direct descendant
                              nodes j.
                                The sums-of-products expression (10.16) is analogous to our earlier rep-
                                                                           corresponds to a prior,
                              resentation of a pedigree likelihood. The factor q b 1
                              and the factor Pr(b j | b i ) to a transmission probability. There is no ana-
                              log of a penetrance function or of genotype elimination in this context. To
                              evaluate expression (10.16), we carry out one summation at a time. It is
   223   224   225   226   227   228   229   230   231   232   233