Page 256 - Elements of Distribution Theory

P. 256

P1: JZP
052184472Xc08 CUNY148/Severini May 24, 2005 17:54

242 Normal Distribution Theory

and variance
2
2
2
2
2
σ − ρ σ = (1 − ρ )σ .
1
1
1
Example 8.7 (Least squares). Let X be a d-dimensional random vector with a multivariate
normal distribution with mean µ and covariance matrix . Write X = (X 1 , X 2 ), where X 1
is real-valued, and partition µ and in a similar manner: µ = (µ 1 ,µ 2 ),

11 12
= .
21 22
Foragiven 1 × (d − 1) matrix A and a given scalar a ∈ R, deﬁne
T
2
2
S(A, a) = E[(X 1 − a − AX 2 ) ] = (µ 1 − a − Aµ 2 ) + 11 + A 22 A − 2 12 A
and suppose we choose A and a to minimize S(A, a).
First note that, given A, a must satisfy
a = µ 1 − Aµ 2 ,
2
so that (µ 1 − a − Aµ 2 ) = 0. Hence, A may be chosen to minimize
T
T
A 22 A − 2 12 A . (8.1)
Write A = 12 −1 + A 1 . Then
22
T T T −1
A 22 A − 2 12 A = A 1 22 A − 12 21 . (8.2)
1 22
Minimizing (8.1) with respect to A is equivalent to minimizing (8.2) with respect to A 1 .
Since 22 is nonnegative-deﬁnite, (8.2) is minimized by A 1 = 0; hence, (8.1) is minimized
2
−1
by A = 12 . That is, the afﬁne function of X 2 that minimizes E[X 1 − (a + AX 2 )] is
22
given by
−1
µ 1 + 12 (X 2 − µ 2 ),
22
which is simply E(X 1 |X 2 ). This is to be expected given Corollary 2.2.
Conditioning on a degenerate random variable
Theorem 8.3 may be extended to the case in which the conditioning random vector, X 2 , has
a singular covariance matrix.

Theorem 8.4. Let X be a d-dimensional random vector with a multivariate normal distri-
bution with mean µ and covariance matrix .
Write X = (X 1 , X 2 ) where X 1 is p-dimensional and X 2 is (d − p)-dimensional, µ =
p
(µ 1 ,µ 2 ) where µ 1 ∈ R and µ 2 ∈ R d−p , and

11 12
=
21 22
where 11 is p × p, 12 = 21 is p × (d − p), and 22 is (d − p) × (d − p). Let r =
rank( 22 ) and suppose that r < d − p. Then the conditional distribution of X 1 given
X 2 = x 2 is a multivariate normal distribution with mean vector
−
µ 1 + 12 (x 2 − µ 2 )
22

251 252 253 254 255 256 257 258 259 260 261