Page 209 - Applied Probability
P. 209

9. Descent Graph Methods
                              194
                              and the constellation of affecteds within the pedigree. Consider one of the
                              suggested statistics U, and denote its value on pedigree k by U k . If this
                              pedigree has observed marker phenotypes M k , then our test statistic is
                              the conditional expectation Z k =E(U k | M k ). In Chapter 6, we suggested
                              testing for excess marker sharing using the standardized statistic

                                                             w k [Z k − E(Z k )]
                                                            k
                                                  S   =                    ,
                                                                 2
                                                                w Var(Z k )
                                                              k  k
                              where w k is a positive weight assigned to pedigree k. The reader will recall
                              the specific recommendation
                                                              1
                                                                   r k
                                                          =
                                                      w k
                                                                Var(Z k )
                              for a pedigree with r k affecteds. Unfortunately, the problem now intrudes of
                              how to calculate E(Z k ) and Var(Z k ). On small pedigrees, one can compute
                              the unconditional values E(U k ) and Var(U k ) simply by enumerating all
                              possible descent graphs. While it is true that E(Z k )=E(U k ), we can only
                              assert that Var(Z k ) ≤ Var(U k ). If we substitute Var(U k ) for Var(Z k ), a
                              standard normal approximation for S is bound to be conservative.
                                It seems that the only remedy is to compute p-values empirically. The
                              necessary simulations are feasible if done intelligently. The fact that dif-
                              ferent pedigrees are independent and contribute additively to each sharing
                              statistic eases the pain of simulation considerably. Consider a generic sum
                              S = X 1 + ··· + X n of n independent random variables. For example, we
                              could take
                                                                        1
                                                                             r k
                                       X k = w k [Z k − E(Z k )],  w k =          .
                                                                           Var(U k )
                              If we want to sample S a million times, we can in principle sample the whole
                              vector (X 1 ,... ,X n ) a million times and sum. This would be prohibitively
                              expensive in the pedigree case because of the work involved in simulating
                              each X k . One simulation statistic for one pedigree involves completely re-
                              sampling the observed markers by gene dropping and then recomputing
                              the test statistic in question. Alternatively, we could sample each X k ,say a
                              hundred times, then construct a million different vectors (X 1 ,... ,X n )by
                              repeatedly drawing each X k independently from its previously constructed
                              subsample of size one hundred. If n is large and the variances Var(X k ) are
                              comparable, then this two-stage procedure is reasonably accurate and costs
                              a fraction of the naive procedure.
                                These ideas are now implemented in the latest version of MENDEL.
                              Extensive testing suggests that the statistics displayed in Table 9.2 have
                              the most power against the indicated alternatives [22]. The superscript
                              “rec” in the table refers to the original statistics given in equation (9.15)
   204   205   206   207   208   209   210   211   212   213   214