Page 363 - Applied Probability
P. 363
Appendix B: The Normal Distribution
353
matrix I, and characteristic function
n
t
t
2
is X
−s /2
−s s/2
E(e
e
= e
.
)=
j
j=1
We now define any affine transformation Y = AX + µ of X to be multi-
variate normal [2]. This definition has several practical consequences. First,
t
t
it is clear that E(Y )= µ and Var(Y )= A Var(X)A = AA = Ω. Second,
any affine transformation BY + ν = BAX + Bµ + ν of Y is also multivari-
ate normal. Third, any subvector of Y is multivariate normal. Fourth, the
characteristic function of Y is
t
t
t
t
t
t
t
t
E(e is Y )= e is µ E(e is AX )= e is µ−s AA s/2 = e is µ−s Ωs/2 .
This enumeration omits two more subtle issues. One is whether Y pos-
sesses a density. Observe that Y lives in an affine subspace of dimension
equal to or less than the rank of A. Thus, if Y has m components, then
n ≥ m must hold in order for Y to possess a density. A second issue is
the existence and nature of the conditional density of a set of components
of Y given the remaining components. We can clarify both of these issues
by making canonical choices of X and A based on the classical QR de-
composition of a matrix, which follows directly from the Gram-Schmidt
orthogonalization procedure [1].
Assuming that n ≥ m, we can write
R
t
A = Q ,
0
where Q is an n×n orthogonal matrix and R is an m×m upper triangular
matrix with nonnegative diagonal entries. (If n = m, we omit the zero
matrix in the QR decomposition.) It follows that
t
t
t
AX =( L 0 ) Q X =( L 0 ) Z.
In view of the usual change of variables formula for probability densities
t
and the facts that the orthogonal matrix Q preserves inner products and
has determinant ±1, the random vector Z has n independent, standard
normal components and serves as a substitute for X. Not only is this true,
but we can dispense with the last n − m components of Z because they
t
are multiplied by the matrix 0 . Thus, we can safely assume n = m and
calculate the density of Y = LZ + µ when L is invertible. In this situation,
t
Ω= LL is termed the Cholesky decomposition, and the usual change of
variables formula shows that Y has density
1 −1 −(y−µ) (L −1 t −1 (y−µ)/2
n/2
t
) L
f(y)= | det L |e
2π
1 t −1
n/2
= | det Ω| −1/2 −(y−µ) Ω (y−µ)/2 ,
e
2π

