Page 240 - Applied Probability
P. 240

10. Molecular Phylogeny
                              226
                                   respectively, where the constants c 1 ,... ,c 6 are same ones defined in
                                   equations (10.12) through (10.15).
                                15. For the nucleotide substitution model of Section 10.5, verify in general
                                   that the equilibrium distribution is
                                                              (δ + κ)+ δ(γ + λ)
                                               π A  =
                                                        (α + γ +   + λ)(γ + δ + κ + λ)
                                                            α(δ + κ)+ κ(γ + λ)
                                                    =
                                               π G
                                                        (α + γ +   + λ)(γ + δ + κ + λ)
                                                            γ(δ + κ)+ σ(γ + λ)
                                               π C  =
                                                        (β + δ + κ + σ)(γ + δ + κ + λ)
                                                            λ(δ + κ)+ β(γ + λ)
                                                    =                              .
                                               π T
                                                        (β + δ + κ + σ)(γ + δ + κ + λ)
                                16. There is an explicit formula for the equilibrium distribution of a
                                   continuous-time Markov chain in terms of weighted in-trees [20]. To
                                   describe this formula, we first define a directed graph on the states
                                   1,... ,n of the chain. The vertices of the graph are the states of the
                                   chain, and the arcs of the graph are ordered pairs of states (i, j)hav-
                                   ing transition rates λ ij > 0. If it is possible to reach some designated
                                   state k from every other state i, then a unique equilibrium distribu-
                                   tion π =(π 1 ,... ,π n ) exists for the chain. Note that this reachability
                                   condition is weaker than requiring that all states communicate.
                                   The equilibrium distribution is characterized by defining certain sub-
                                   graphs called in-trees. An in-tree T i to state i is a subgraph having
                                   n − 1 arcs and connecting each vertex j  = i to i by some directed
                                   path. Ignoring orientations, an in-tree is graphically a tree; observing
                                   orientations, all paths lead to i. The weight w(T i ) associated with
                                   the in-tree T i is the product of the transition rates λ jk labeling the
                                   various arcs (j, k) of the in-tree. For instance, in the nucleotide sub-
                                   stitution chain, one in-tree to A has arcs (G,A), (C,A), and (T,C).
                                   Its associated weight is  δσ.
                                   In general, the equilibrium distribution is given by


                                                                    w(T i )
                                                                  T i
                                                       π i  =              .             (10.17)
                                                                      w(T j )
                                                                j   T j
                                   The reachability condition implies that in-trees to state k exist and
                                   consequently that the denominator in (10.17) is positive. The value
                                   of the in-tree formula (10.17) is limited by the fact that in a Markov
                                   chain with n states there can be as many as n n−2  in-trees to a given
                                   state. Thus, in the nucleotide substitution model, there are 4 4−2  =16
                                   in-trees to each state and 64 in-trees in all. If you are undeterred by
   235   236   237   238   239   240   241   242   243   244   245