Page 256 - Elements of Distribution Theory
P. 256

P1: JZP
            052184472Xc08  CUNY148/Severini  May 24, 2005  17:54





                            242                       Normal Distribution Theory

                            and variance
                                                                      2
                                                                         2
                                                       2
                                                             2
                                                           2
                                                     σ − ρ σ = (1 − ρ )σ .
                                                             1
                                                                        1
                                                      1
                            Example 8.7 (Least squares). Let X be a d-dimensional random vector with a multivariate
                            normal distribution with mean µ and covariance matrix  . Write X = (X 1 , X 2 ), where X 1
                            is real-valued, and partition µ and   in a similar manner: µ = (µ 1 ,µ 2 ),

                                                                11   12
                                                          =            .
                                                                21   22
                              Foragiven 1 × (d − 1) matrix A and a given scalar a ∈ R, define
                                                                                       T
                                                                        2
                                                       2
                               S(A, a) = E[(X 1 − a − AX 2 ) ] = (µ 1 − a − Aµ 2 ) +   11 + A  22 A − 2  12 A
                            and suppose we choose A and a to minimize S(A, a).
                              First note that, given A, a must satisfy
                                                         a = µ 1 − Aµ 2 ,
                                               2
                            so that (µ 1 − a − Aµ 2 ) = 0. Hence, A may be chosen to minimize
                                                                      T
                                                             T
                                                        A  22 A − 2  12 A .                     (8.1)
                            Write A =   12   −1  + A 1 . Then
                                         22
                                                  T        T          T       −1
                                            A  22 A − 2  12 A = A 1   22 A −   12     21 .      (8.2)
                                                                      1       22
                            Minimizing (8.1) with respect to A is equivalent to minimizing (8.2) with respect to A 1 .
                            Since   22 is nonnegative-definite, (8.2) is minimized by A 1 = 0; hence, (8.1) is minimized
                                                                                                 2
                                       −1
                            by A =   12   . That is, the affine function of X 2 that minimizes E[X 1 − (a + AX 2 )] is
                                       22
                            given by
                                                               −1
                                                      µ 1 +   12   (X 2 − µ 2 ),
                                                               22
                            which is simply E(X 1 |X 2 ). This is to be expected given Corollary 2.2.
                            Conditioning on a degenerate random variable
                            Theorem 8.3 may be extended to the case in which the conditioning random vector, X 2 , has
                            a singular covariance matrix.


                            Theorem 8.4. Let X be a d-dimensional random vector with a multivariate normal distri-
                            bution with mean µ and covariance matrix  .
                              Write X = (X 1 , X 2 ) where X 1 is p-dimensional and X 2 is (d − p)-dimensional, µ =
                                              p
                            (µ 1 ,µ 2 ) where µ 1 ∈ R and µ 2 ∈ R d−p , and

                                                                11   12
                                                          =
                                                                21   22
                            where   11 is p × p,   12 =   21 is p × (d − p), and   22 is (d − p) × (d − p). Let r =
                            rank(  22 ) and suppose that r < d − p. Then the conditional distribution of X 1 given
                            X 2 = x 2 is a multivariate normal distribution with mean vector
                                                                −
                                                      µ 1 +   12   (x 2 − µ 2 )
                                                                22
   251   252   253   254   255   256   257   258   259   260   261