Page 453 - Matrix Analysis & Applied Linear Algebra
P. 453
5.14 Why Least Squares? 449
ˆ
†
Proof. It is clear that β = X y is a linear estimator of β because each com-
ˆ
†
ponent β i = [X ] ik y k is a linear function of the observations. The fact that
k
ˆ
β is unbiased follows by using the linear nature of expected value to write
E[y]= E[Xβ + ε]= E[Xβ]+ E[ε]= Xβ + 0 = Xβ,
so that
ˆ
T −1 T
†
†
†
E β = E X y = X E[y]= X Xβ = X X X Xβ = β.
ˆ
†
To argue that β = X y has minimal variance among all linear unbiased estima-
∗
tors for β, let β be an arbitrary linear unbiased estimator for β. Linearity of
∗
∗
β implies the existence of a matrix L n×m such that β = Ly, and unbiased-
∗
ness insures β = E[β ]= E[Ly]= LE[y]= LXβ. We want β = LXβ to hold
irrespective of the values of the components in β, so it must be the case that
LX = I n (recall Exercise 3.5.5). For i = j we have
=⇒ E[ε i ε j ]= E[ε i ]E[ε j ]=0,
0=Cov[ε i ,ε j ]= E[ε i ε j ] − µ ε i µ ε j
so that
2 2 2
) ]= E[ε ]=Var[ε i ]= σ when i = j,
E[(y i − µ y i i
Cov[y i ,y j ]= (5.14.5)
)] = E[ε i ε j ]= 0 when i = j.
E[(y i − µ y i )(y j − µ y j
2
2
This together with the fact that Var[aW +bZ]= a Var[W]+b Var[Z] whenever
Cov[W, Z]= 0 allows us to write
; <
m m
2
2
∗
Var[β ]=Var[L i∗ y]=Var l ik y k = σ 2 l 2 ik = σ L i∗ .
i
2
k=1 k=1
Since LX = I, it follows that Var[β ]is minimal if and only if L i∗ is the
∗
i
T
T
minimum norm solution of the system z X = e . We know from (5.12.17) that
i
T
T
†
the (unique) minimum norm solution is given by z = e X = X , so Var[β ]
†
∗
i i∗ i
†
is minimal if and only if L i∗ = X . Since this holds for i =1, 2,...,m, it follows
i∗
ˆ
†
that L = X . In other words, the components of β = X y are the (unique)
†
minimal variance linear unbiased estimators for the parameters in β.
Exercises for section 5.14
5.14.1. Fora matrix Z m×n =[z ij ], of random variables, E[Z]is defined to be
the m × n matrix whose (i, j)-entry is E[z ij ]. Consider the standard
linear model described in (5.14.4), and let ˆ e denote the vector of random
ˆ
ˆ
T
T
variables defined by ˆ e = y − Xβ in which β = X X −1 X y = X y.
†
Demonstrate that
T
ˆ e ˆ e
2
ˆ σ =
m − n
T
2
T
is an unbiased estimator for σ . Hint: d c = trace(cd ) for column
vectors c and d, and, by virtue of Exercise 5.9.13,
trace I − XX † = m − trace XX † = m − rank XX † = m − n.

