Page 451 - Matrix Analysis & Applied Linear Algebra
P. 451
5.14 Why Least Squares? 447
• E[X]= µ X denotes the mean (or expected value) of X.
2 2 2
• Var[X]= E (X − µ X ) = E[X ] − µ is the variance of X.
X
• Cov[X, Y ]= E[(X − µ X )(Y − µ Y )] = E[XY ] − µ X µ Y is the covariance of
X and Y.
Minimum Variance Unbiased Estimators
ˆ
An estimator θ (consider as a random variable) for a parameter θ is
ˆ
ˆ
said to be unbiased when E[θ]= θ, and θ is called a minimum
ˆ
ˆ
variance unbiased estimator for θ whenever Var[θ] ≤ Var[φ] for
ˆ
all unbiased estimators φ of θ.
These ideas make it possible to precisely articulate why the method of least
squares is the best way to fit observed data. Let Y beavariable that is known
(or assumed) to be linearly related to other variables X 1 ,X 2 ,...,X n according
62
to the equation
Y = β 1 X 1 + ··· + β n X n , (5.14.1),
where the β i ’s are unknown constants (parameters). Suppose that the values
assumed by the X i ’s are not subject to error or variation and can be exactly
observed or specified, but, due perhaps to measurement error, the values of Y
cannot be exactly observed. Instead, we observe
y = Y + ε = β 1 X 1 + ··· + β n X n + ε, (5.14.2)
where ε is a random variable accounting for the measurement error. For exam-
ple, consider the problem of determining the velocity v of a moving object by
measuring the distance D it has traveled at various points in time T by using
the linear relation D = vT. Time can be prescribed at exact values such as
T 1 =1 second, T 2 =2 seconds, etc., but observing the distance traveled at the
prescribed values of T will almost certainly involve small measurement errors so
that in reality the observed distances satisfy d = D + ε = vT + ε. Now consider
the general problem of determining the parameters β k in (5.14.1) by observing
n
(or measuring) values of Y at m different points X i∗ =(x i1 ,x i2 ,...,x in ) ∈ ,
where x ij is the value of X j to be used when making the i th observation. If y i
denotes the random variable that represents the outcome of the i th observation
of Y, then according to (5.14.2),
y i = β 1 x i1 + ··· + β n x in + ε i , i =1, 2,...,m, (5.14.3)
62
Equation (5.14.1) is called a no-intercept model, whereas the slightly more general equation
Y = β 0 + β 1 X 1 + ··· + β nX n is known as an intercept model. Since the analysis for an
intercept model is not significantly different from the analysis of the no-intercept case, we deal
only with the no-intercept case and leave the intercept model for the reader to develop.

