Page 232 - Matrix Analysis & Applied Linear Algebra
P. 232

4.6 Classical Least Squares                                                        227

                                    Applying these rules to the function in (4.6.3) produces

                                                  ∂f    ∂x T  T      T  T   ∂x    ∂x T  T
                                                      =     A Ax + x A A       − 2    A b.
                                                  ∂x i  ∂x i                ∂x i   ∂x i
                                    Since ∂x/∂x i = e i (the i th  unit vector), we have
                                          ∂f    T  T       T  T        T  T       T  T       T  T
                                             = e A Ax + x A Ae i − 2e A b =2e A Ax − 2e A b.
                                                i
                                                                       i
                                                                                  i
                                                                                             i
                                          ∂x i
                                           T
                                              T

                                    Using e A = A   T    i∗  and setting ∂f/∂x i = 0 produces the n equations
                                           i
                                                       T           T
                                                     A    Ax = A      b   for i =1, 2,...,n,
                                                        i∗          i∗
                                                                                             T
                                                                                    T
                                    which can be written as the single matrix equation A Ax = A b. Calculus
                                    guarantees that the minimum value of f occurs at some solution of this system.
                                                                                                       T
                                                                                              T
                                    But this is not enough—we want to know that every solution of A Ax = A b
                                    is a least squares solution. So we must show that the function f in (4.6.3) attains
                                                                        T
                                                                                 T
                                    its minimum value at each solution to A Ax = A b. Observe that if z is a
                                                                               T
                                                                                         T
                                                                                      T
                                    solution to the normal equations, then f(z)= b b − z A b. For any other
                                         n×1
                                    y ∈     , let u = y − z, so y = z + u, and observe that
                                                                   T
                                                     f(y)= f(z)+ v v,    where  v = Au.
                                           T
                                                     2
                                    Since v v =     v ≥ 0, it follows that f(z) ≤ f(y) for all y ∈  n×1 , and
                                                  i  i
                                    thus f attains its minimum value at each solution of the normal equations. The
                                    remaining statements in the theorem follow from the properties established on
                                    p. 213.
                                        The classical least squares problem discussed at the beginning of this sec-
                                    tion and illustrated in Example 4.6.1 is part of a broader topic known as linear
                                    regression, which is the study of situations where attempts are made to express
                                    one variable y as a linear combination of other variables t 1 ,t 2 ,...,t n . In prac-
                                    tice, hypothesizing that y is linearly related to t 1 ,t 2 ,...,t n means that one
                                    assumes the existence of a set of constants {α 0 ,α 1 ,...,α n } (called parameters)
                                    such that
                                                      y = α 0 + α 1 t 1 + α 2 t 2 + ··· + α n t n + ε,
                                    where ε is a “random function” whose values “average out” to zero in some
                                    sense. Practical problems almost always involve more variables than we wish to
                                    consider, but it is frequently fair to assume that the effect of variables of lesser
                                    significance will indeed “average out” to zero. The random function ε accounts
                                    for this assumption. In other words, a linear hypothesis is the supposition that
                                    the expected (or mean) value of y at each point where the phenomenon can be
                                    observed is given by a linear equation
                                                      E(y)= α 0 + α 1 t 1 + α 2 t 2 + ··· + α n t n .
   227   228   229   230   231   232   233   234   235   236   237