Page 112 - Elements of Distribution Theory
P. 112

P1: JZP
            052184472Xc04  CUNY148/Severini  May 24, 2005  2:39





                            98                         Moments and Cumulants

                                               2
                                2
                            Let σ = Var(X) and σ = Var(Y). Taking
                                X             Y
                                                             σ Y
                                                         a =    ρ(X, Y)
                                                             σ X
                            and
                                                              σ Y
                                                     b = µ Y −   ρ(X, Y)µ X ,
                                                              σ X
                            we have that

                                       σ Y                          2          σ Y
                               Var Y −    ρ(X, Y)X  = Var(Y) + ρ(X, Y) Var(Y) − 2  ρ(X, Y)Cov(X, Y)
                                       σ X                                     σ X
                                                                2
                                                    = (1 − ρ(X, Y) )Var(Y).
                            Therefore, we must have ρ(X, Y) = 0, proving part (iii).


                                                               2
                                                                               2
                              According to Theorem 4.3, 0 ≤ ρ(X, Y) ≤ 1 with ρ(X, Y) = 1if and only if Y is,
                                                                                               2
                            with probability 1, a linear function of X.Part (iii) of the theorem states that ρ(X, Y) = 0
                            if and only if the linear function of X that best predicts Y in the sense of the criterion
                                          2
                            E{[Y − (aX + b)] } is the function with a = 0 and b = E(Y); that is, X is of no help in
                            predicting Y,at least if we restrict attention to linear functions of X. The restriction to linear
                            functions is crucial, as the following example illustrates.
                            Example 4.5 (Laplace distribution). Let X denote a random variable with an absolutely
                            continuous distribution with density function

                                                      1
                                                p(x) =  exp{−|x|}, −∞ < x < ∞;
                                                      2
                                                                                          2
                            this distribution is called the Laplace distribution. Note that E(X) = 0, E(X ) = 2, and
                               3
                            E(X ) = 0.
                                       2
                              Let Y = X . Then
                                                                        3
                                             Cov(Y, X) = E[(Y − 2)X] = E[X − 2X] = 0
                            so that ρ(Y, X) = 0. Hence, linear functions of X are not useful for predicting Y.However,
                            there are clearly nonlinear functions of X that are useful for predicting Y;in particular, X 2
                            yields Y exactly.

                            Covariance matrices
                            Joint moments and joint central moments for sets of more than two real-valued random
                            variables may be defined in a similar manner. For instance, the joint moment of (X 1 ,..., X d )
                            of order (i 1 , i 2 ,..., i d )isgiven by
                                                              i 1   i d
                                                          E X ··· X
                                                             1     d
                            provided that the expectation exists. Such moments involving three or more random vari-
                            ables arise only occasionally and we will not consider them here.
                              Let X denote a d-dimensional random vector and write X = (X 1 , X 2 ,..., X d ), where
                            X 1 ,..., X d are real-valued. We are often interested in the set of all covariances of pairs
   107   108   109   110   111   112   113   114   115   116   117