Page 451 - Matrix Analysis & Applied Linear Algebra
P. 451

5.14 Why Least Squares?                                                            447


                                    •  E[X]= µ X denotes the mean (or expected value) of X.
                                                           2       2    2
                                    •  Var[X]= E (X − µ X )  = E[X ] − µ   is the variance of X.
                                                                        X
                                    •  Cov[X, Y ]= E[(X − µ X )(Y − µ Y )] = E[XY ] − µ X µ Y is the covariance of
                                       X and Y.

                                              Minimum Variance Unbiased Estimators

                                                    ˆ
                                       An estimator θ (consider as a random variable) for a parameter θ is
                                                                                ˆ
                                                                   ˆ
                                       said to be unbiased when E[θ]= θ, and θ is called a minimum
                                                                                                ˆ
                                                                                        ˆ
                                       variance unbiased estimator for θ whenever Var[θ] ≤ Var[φ] for
                                                             ˆ
                                       all unbiased estimators φ of θ.
                                        These ideas make it possible to precisely articulate why the method of least
                                    squares is the best way to fit observed data. Let Y beavariable that is known
                                    (or assumed) to be linearly related to other variables X 1 ,X 2 ,...,X n according
                                                 62
                                    to the equation
                                                            Y = β 1 X 1 + ··· + β n X n ,         (5.14.1),
                                    where the β i ’s are unknown constants (parameters). Suppose that the values
                                    assumed by the X i ’s are not subject to error or variation and can be exactly
                                    observed or specified, but, due perhaps to measurement error, the values of Y
                                    cannot be exactly observed. Instead, we observe

                                                      y = Y + ε = β 1 X 1 + ··· + β n X n + ε,    (5.14.2)

                                    where ε is a random variable accounting for the measurement error. For exam-
                                    ple, consider the problem of determining the velocity v of a moving object by
                                    measuring the distance D it has traveled at various points in time T by using
                                    the linear relation D = vT. Time can be prescribed at exact values such as
                                    T 1 =1 second, T 2 =2 seconds, etc., but observing the distance traveled at the
                                    prescribed values of T will almost certainly involve small measurement errors so
                                    that in reality the observed distances satisfy d = D + ε = vT + ε. Now consider
                                    the general problem of determining the parameters β k in (5.14.1) by observing
                                                                                                       n
                                    (or measuring) values of Y at m different points X i∗ =(x i1 ,x i2 ,...,x in ) ∈  ,
                                    where x ij is the value of X j to be used when making the i th  observation. If y i
                                    denotes the random variable that represents the outcome of the i th  observation
                                    of Y, then according to (5.14.2),

                                                  y i = β 1 x i1 + ··· + β n x in + ε i ,  i =1, 2,...,m,  (5.14.3)
                                 62
                                    Equation (5.14.1) is called a no-intercept model, whereas the slightly more general equation
                                    Y = β 0 + β 1 X 1 + ··· + β nX n is known as an intercept model. Since the analysis for an
                                    intercept model is not significantly different from the analysis of the no-intercept case, we deal
                                    only with the no-intercept case and leave the intercept model for the reader to develop.
   446   447   448   449   450   451   452   453   454   455   456