Page 69 - Elements of Distribution Theory

P. 69

P1: JZP
052184472Xc02 CUNY148/Severini May 24, 2005 2:29

2.4 Conditional Expectation 55

Example 2.15 (A mixed distribution). Let (X, Y) denote a two-dimensional random vector
with the distribution described in Example 2.3 and considered further in Example 2.13.
Recall that the conditional distribution of X given Y = y is absolutely continuous with
density function y exp(−yx), x > 0. It follows

E(X|Y = y) = 1/y.

The following theorem gives several properties of conditional expected values. These
follow immediately from the properties of integrals, as described in Appendix 1, and, hence,
the proof is left as an exercise. In describing these results, we write that a property holds for
“almost all y (F Y )” if the set of y ∈ Y for which the property does not hold has probability
0 under F Y .

Theorem 2.4. Let (X, Y) denote a random vector with range X × Y; note that X and Y
may each be vectors. Let g 1 ,..., g m denote a real-valued functions deﬁned on X such that
E[|g j (X)|] < ∞,j = 1,..., m. Then
(i) If g 1 is nonnegative, then

E[g 1 (X)|Y = y] ≥ 0 for almost all y (F Y ).

(ii) If g 1 is constant, g 1 (x) ≡ c, then

E[g 1 (X)|Y = y] = c for almost all y (F Y ).
(iii) For almost all y (F Y ),

E[g 1 (X) +· · · + g m (X)|Y = y] = E[g 1 (X)|Y = y] +· · · + E[g m (X)|Y = y].
Note that E[g(X)|Y = y]isa function of y, which we may denote, for example, by f (y).
It is often convenient to consider the random variable f (Y), which we denote by E[g(X)|Y]
and call the conditional expected value of g(X)given Y. This random variable is a function
of Y, yet it retains some of the properties of g(X). According to (2.5), E[g(X)|Y]isany
function of Y satisfying

E{g(X)I {Y∈B} }= E{E[g(X)|Y]I {Y∈B} } for all B ⊂ Y. (2.6)

The following result gives a number of useful properties of conditional expected values.

Theorem 2.5. Let (X, Y) denote a random vector with range X × Y, let T : Y → T denote
a function on Y, let g denote a real-valued function on X such that E[|g(X)|] < ∞, and
let h denote a real-valued function on Y such that E[|g(X)h(Y)|] < ∞. Then
(i) E{E[g(X)|Y]}= E[g(X)]
(ii) E[g(X)h(Y)|Y] = h(Y)E[g(X)|Y] with probability 1
(iii) E[g(X)|Y, T (Y)] = E[g(X)|Y] with probability 1
(iv) E[g(X)|T (Y)] = E{E[g(X)|Y]|T (Y)} with probability 1
(v) E[g(X)|T (Y)] = E{E[g(X)|T (Y)]|Y} with probability 1

64 65 66 67 68 69 70 71 72 73 74