Page 450 - Matrix Analysis & Applied Linear Algebra
P. 450
446 Chapter 5 Norms, Inner Products, and Orthogonality
5.14 WHY LEAST SQUARES?
Drawing inferences about natural phenomena based upon physical observations
and estimating characteristics of large populations by examining small samples
are fundamental concerns of applied science. Numerical characteristics of a phe-
nomenon or population are often called parameters, and the goal is to design
functions or rules called estimators that use observations or samples to estimate
parameters of interest. For example, the mean height h of all people is a pa-
rameter of the world’s population, and one way of estimating h is to observe
the mean height of a sample of k people. In other words, if h i is the height of
ˆ
the i th person in a sample, the function h defined by
k
1
ˆ h i
h(h 1 ,h 2 ,...,h k )=
k
i=1
ˆ
ˆ
is an estimator for h. Moreover, h is a linear estimator because h is a linear
function of the observations.
Good estimators should possess at least two properties—they should be un-
biased and they should have minimal variance. For example, consider estimating
the center of a circle drawn on a wall by asking Larry, Moe, and Curly to each
throw one dart at the circle. To decide which estimator is best, we need to know
more about each thrower’s style. While being able to throw a tight pattern, it is
known that Larry tends to have a left-hand bias in his style. Moe doesn’t suffer
from a bias, but he tends to throw a rather large pattern. However, Curly can
throw a tight pattern without a bias. Typical patterns are shown below.
Larry Moe Curly
Although Larry has a small variance, he is an unacceptable estimator be-
cause he is biased in the sense that his average is significantly different than
the center. Moe and Curly are each unbiased estimators because they have an
average that is the center, but Curly is clearly the preferred estimator because
his variance is much smaller than Moe’s. In other words, Curly is the unbiased
estimator of minimal variance.
To make these ideas more formal, let’s adopt the following standard no-
tation and terminology from elementary probability theory concerning random
variables X and Y.

