Page 257 - Elements of Distribution Theory
P. 257

P1: JZP
            052184472Xc08  CUNY148/Severini  May 24, 2005  17:54





                                                 8.3 Conditional Distributions               243

                        and covariance matrix

                                                              −
                                                      11 −   12     21 ,
                                                              22
                                                                                 T
                                                                      T
                                                                                       T
                        provided that x 2 is such that for any vector a satisfying a   22 a = 0,a x 2 = a µ 2 . Here
                          denotes the Moore–Penrose generalized inverse of   22 .
                          −
                         22
                        Proof. By part (iii) of Theorem 8.1, there is a linear transformation of X 2 to (Y 1 , Y 2 ) such
                        that Y 1 is constant with probability 1 and Y 2 has a multivariate normal distribution with full-
                        rank covariance matrix. Furthermore, Y 2 = CX 2 where C is an r × (d − p) matrix with
                        rows taken to be the eigenvectors corresponding to nonzero eigenvalues of   22 ; see the
                        proof of Theorem 8.1. Hence, the conditional distribution of X 1 given X 2 is equivalent to
                        the conditional distribution of X 1 given Y 2 . Since the covariance matrix of Y 2 is of full-rank,
                        it follows from Theorem 8.3 that this conditional distribution is multivariate normal with
                        mean vector
                                                            −1
                                                  µ 1 +   13   (y 2 − µ 3 )
                                                            33
                        and covariance matrix
                                                              −1
                                                      11 −   13     31
                                                              33
                        where µ 3 denotes the mean of Y 2 ,   13 denotes the covariance of X 1 and Y 2 , and   33 denotes
                        the covariance matrix of Y 2 .
                          By considering the transformation

                                                          I p 0
                                                  X 1              X 1
                                                      =                ,
                                                  Y 2     0 C      X 2
                        it follows from Theorem 8.1 that
                                                                T
                                                        13 =   12 C ,
                                                                 T
                                                       33 = C  22 C ,
                        and µ 3 = Cµ 2 . Hence, the conditional distribution of X 1 given X 2 = x 2 is multivariate
                        normal with mean
                                                      T
                                                              T −1
                                             µ 1 +   12 C [C  22 C ] C(x 2 − µ 2 )
                        and covariance matrix
                                                                T −1
                                                         T
                                                 11 −   12 C [C  22 C ] C  21 ,
                                                                    T
                                                                                     T
                                                                                            T
                        provided that x 2 is such that for any vector a such that a X 2 has variance 0, a x 2 = a µ 2 ;
                        see Example 8.8 below for an illustration of this requirement.
                                          T
                          Recall that   22 = C DC where D is a diagonal matrix with diagonal elements taken to
                        be the nonzero eigenvalues of   22 . Note that
                                                                    T
                                            T −1
                                    T
                                                                              −1
                                                                                   T
                                                              T
                                                                            T
                                                        T
                                 22 C [C  22 C ] C  22 = C D(CC )[(CC )D(CC )] (CC )DC
                                                        T
                                                    = C DC =   22 ,
                                T
                        since (CC ) and D are invertible. Hence,
                                                         T
                                                                 T −1
                                                     †  ≡ C [C  22 C ] C
                                                   22
   252   253   254   255   256   257   258   259   260   261   262