Page 209 - Applied Probability
P. 209
9. Descent Graph Methods
194
and the constellation of affecteds within the pedigree. Consider one of the
suggested statistics U, and denote its value on pedigree k by U k . If this
pedigree has observed marker phenotypes M k , then our test statistic is
the conditional expectation Z k =E(U k | M k ). In Chapter 6, we suggested
testing for excess marker sharing using the standardized statistic
w k [Z k − E(Z k )]
k
S = ,
2
w Var(Z k )
k k
where w k is a positive weight assigned to pedigree k. The reader will recall
the specific recommendation
1
r k
=
w k
Var(Z k )
for a pedigree with r k affecteds. Unfortunately, the problem now intrudes of
how to calculate E(Z k ) and Var(Z k ). On small pedigrees, one can compute
the unconditional values E(U k ) and Var(U k ) simply by enumerating all
possible descent graphs. While it is true that E(Z k )=E(U k ), we can only
assert that Var(Z k ) ≤ Var(U k ). If we substitute Var(U k ) for Var(Z k ), a
standard normal approximation for S is bound to be conservative.
It seems that the only remedy is to compute p-values empirically. The
necessary simulations are feasible if done intelligently. The fact that dif-
ferent pedigrees are independent and contribute additively to each sharing
statistic eases the pain of simulation considerably. Consider a generic sum
S = X 1 + ··· + X n of n independent random variables. For example, we
could take
1
r k
X k = w k [Z k − E(Z k )], w k = .
Var(U k )
If we want to sample S a million times, we can in principle sample the whole
vector (X 1 ,... ,X n ) a million times and sum. This would be prohibitively
expensive in the pedigree case because of the work involved in simulating
each X k . One simulation statistic for one pedigree involves completely re-
sampling the observed markers by gene dropping and then recomputing
the test statistic in question. Alternatively, we could sample each X k ,say a
hundred times, then construct a million different vectors (X 1 ,... ,X n )by
repeatedly drawing each X k independently from its previously constructed
subsample of size one hundred. If n is large and the variances Var(X k ) are
comparable, then this two-stage procedure is reasonably accurate and costs
a fraction of the naive procedure.
These ideas are now implemented in the latest version of MENDEL.
Extensive testing suggests that the statistics displayed in Table 9.2 have
the most power against the indicated alternatives [22]. The superscript
“rec” in the table refers to the original statistics given in equation (9.15)