Page 228 - Applied Probability

P. 228

10. Molecular Phylogeny
214
In summary, we have found the top row [p AA (t),p AG (t),p AC (t),p AT (t)]
of P(t) corresponding to the nucleotide A. By symmetrical arguments, the
other rows of P(t) can also be calculated. In the limit t →∞, the rows of
P(t) all collapse to the equilibrium distribution π.
10.6 Maximum Likelihood Reconstruction
Maximum likelihood provides a second method of comparing evolutionary
trees. As with maximum parsimony, DNA data are gathered at several
diﬀerent sites for several diﬀerent contemporary taxa. A model is then
posed for how diﬀerences evolve at the various sites. Most models involve
the following assumptions:
(a) All sites evolve according to the same tree.
(b) All sites evolve independently.

(c) All sites evolve according to the same stochastic laws.
(d) Conditional on the base at a given site of an internal node, evolution
proceeds independently at the site along the two branches of the tree
descending from the node.

Further assumptions about the detailed nature of evolution at a site can be
imposed. For instance, we can adopt the generalized Kimura substitution
model as just developed.
We now discuss how to compute the likelihood of the bases observed
at the tips of an evolutionary tree for a particular site. According to as-
sumptions (a), (b), and (c), we need merely multiply these site-speciﬁc
likelihoods to recover the overall likelihood of a given tree. For a tree with
n tips, it is convenient to label the internal nodes 1,... ,n − 1 and the tips
n,... , 2n − 1. Also, let b i be either one of the four possible bases at an
internal node or the observed base at a tip. If the root is node 1, then
. Assumption
designate the prior probability of base b 1 at this node by q b 1
(d) now provides the likelihood expression

··· q b 1 Pr(b j | b i ), (10.16)
(i,j)
b 1 b n−1
where (i, j) ranges over all pairs of ancestral nodes i and direct descendant
nodes j.
The sums-of-products expression (10.16) is analogous to our earlier rep-
corresponds to a prior,
resentation of a pedigree likelihood. The factor q b 1
and the factor Pr(b j | b i ) to a transmission probability. There is no ana-
log of a penetrance function or of genotype elimination in this context. To
evaluate expression (10.16), we carry out one summation at a time. It is

223 224 225 226 227 228 229 230 231 232 233