Page 133 - Applied Probability
P. 133
7. Computation of Mendelian Likelihoods
all consolidated intervals of the corresponding recombination fractions θ or
of their complements 1−θ, depending on whether the gamete shows recom-
1
accounts for the parental
bination on a given interval or not. The factor of
2
chromosome chosen for the first locus. In the exceptional case where there
are no heterozygous loci, the gamete transmission probability is 1. If there
1
is only one heterozygous locus, the gamete transmission probability is 117 .
2
Recombination fractions for consolidated intervals can be computed via
Trow’s formula as described in Problem 1.
The likelihood L of a pedigree with n people can now be assembled from
these component parts. Let the ith person have phenotype X i and possible
genotype G i . Conditioning on the genotypes of each of the n people yields
Ott’s [27] representation of the likelihood
L = ··· Pr(X 1 ,...,X n | G 1 ,... ,G n )Pr(G 1 ,... ,G n )
G 1 G n
= ··· Pen(X i | G i )Pr(G 1 ,...,G n ) (7.1)
i
G 1 G n
= ··· Pen(X i | G i ) Prior(G j ) Tran(G m | G k ,G l ),
i j
G 1 G n {k,l,m}
where the product on j is taken over all founders and the product on
{k, l, m} is taken over all parent–offspring triples.
At this point, several comments are appropriate concerning the explicit
likelihood representation (7.1). First, ranges of summation for the geno-
types are not specified. At the very least it is profitable to eliminate any
genotype G i with Pen(X i | G i ) = 0. We will discuss later an algorithm
for genotype elimination that performs much better than this naive tac-
tic in most circumstances. Second, the notation in (7.1) does not make it
clear whether the likelihood L should be computed as a joint sum or as
an iterated sum. One can argue rigorously that an iterated sum is always
preferable to a joint sum if minimizing counts of additions and multiplica-
tions is taken as a criterion [18]. Viewing (7.1) as an iterated sum opens
up the possibility of rearranging the order of summation so as to achieve
the most efficient computation. Third, calculation of L is numerically sta-
ble since only additions and multiplications of nonnegative numbers are
involved. There will be no disastrous roundoff errors due to subtraction
of quantities of similar magnitude. However, serious underflows can be en-
countered because all terms are usually probabilities and hence lie in the
interval [0, 1]. Underflows can be successfully defused by repeated rescaling
and reporting the final answer as a loglikelihood. Last of all, the various
terms in (7.1) can be viewed as values taken on by arrays. For instance,
Pen(X i | G i ) is an array of rank 1 that depends on the possible genotypes
G i for i. Similarly, Tran(G k | G i ,G j ) is an array of rank 3 depending on G i ,
G j , and G k jointly. Thus, computation of L is inherently array-oriented.