Page 180 - Matrices theory and applications
P. 180

163
                                                    9.5. The Method of the Conjugate Gradient
                              Theorem 9.5.1 If k ≤ l,then
                                                                              2k


                                                                    K(A) − 1
                                                                                 .
                                                                                          (9.6)
                                           E(x k − ¯x) ≤ 4E(x 0 − ¯x)

                                                                    K(A)+ 1
                                We now set r k = r(x k )= A(x k − ¯x). We have seen that r l =0 and that
                              r k  =0 if k< l.In fact, r k is the gradient of J at x k . The minimality of J at
                              x k over x 0 +H k thus implies that r k ⊥H k (for the usual scalar product). In
                              other words, we have  r k ,p j 
 =0 if j< k. However, x k −¯x ∈ e 0 +H k can also
                                              x
                              be written as x k − ¯ = Q(A)e 0 with deg Q ≤ k, which implies r k = Q(A)r 0 ,
                              so that r k ∈H k+1 .If k< l, one therefore has H k+1 = H k ⊕ IRr k .
                                We now normalize p k (which was not done up to now) by
                                                        p k − r k ∈H k .
                              In other words, p k is the A-orthogonal projection of r k = r(x k ), parallel to
                              H k . It is actually an element of H k+1 ,since r k ∈H k+1 . It is also nonzero
                              since r k  ∈H k .Wenotethat r k is orthogonal to H k with respect to the
                              usual scalar product, though p k is orthogonal to H k with respect to the
                              A-scalar product; this explains why p k and r k are generally different.
                                If j ≤ k−2, we compute  A(p k −r k ),p j 
 = − Ar k ,p j 
 = − r k ,Ap j 
 =0.
                              We have used successively the conjugation of the p k , the symmetry of A,
                              the fact that Ap j ∈H j+2 , and the orthogonality of r k and H k .Wehave
                              therefore p k − r k ⊥ AH k−1 ,so that
                                                       p k = r k + δ k p k−1              (9.7)
                              for a suitable number δ k .
                              9.5.2 Implementing the Conjugate Gradient
                              The main feature of the conjugate gradient is the simplicity of the com-
                              putation of the vectors x k , which is done by induction. To begin with, we
                              have p 0 = r 0 = Ax 0 − b,where x 0 is at our disposal. Let us assume now
                              that x k and p k−1 are known. Then r k = Ax k − b.If r k =0, we already
                              have the solution. Otherwise, the formulas (9.5, 9.7) show that in fact,
                              x k+1 minimizes J over the plane x k + IRr k ⊕ IRp k−1 . We therefore have
                              x k+1 = x k +α k r k +β k p k−1 , where the entries α k ,β k are obtained by solving
                              the linear system of two equations
                                                                           2
                                             α k  Ar k ,r k 
 + β k  Ar k ,p k−1 
 +  r k   =0,
                                               α k  Ar k ,p k−1 
 + β k  Ap k−1 ,p k−1 
 =0

                              (we have used  r k ,p k−1 
 =0). Then we have δ k = β k /α k .Observe that α k
                              is nonzero, because otherwise β k wouldvanishand r k would too.
                                Summing up, the algorithm reads as follows

                                 • Choose x 0 ; then define p 0 = r 0 = r(x 0 ):= Ax 0 − b.
   175   176   177   178   179   180   181   182   183   184   185