Page 180 - Matrices theory and applications

P. 180

163
9.5. The Method of the Conjugate Gradient
Theorem 9.5.1 If k ≤ l,then
2k

K(A) − 1
.
(9.6)
E(x k − ¯x) ≤ 4E(x 0 − ¯x)

K(A)+ 1
We now set r k = r(x k )= A(x k − ¯x). We have seen that r l =0 and that
r k =0 if k< l.In fact, r k is the gradient of J at x k . The minimality of J at
x k over x 0 +H k thus implies that r k ⊥H k (for the usual scalar product). In
other words, we have r k ,p j
=0 if j< k. However, x k −¯x ∈ e 0 +H k can also
x
be written as x k − ¯ = Q(A)e 0 with deg Q ≤ k, which implies r k = Q(A)r 0 ,
so that r k ∈H k+1 .If k< l, one therefore has H k+1 = H k ⊕ IRr k .
We now normalize p k (which was not done up to now) by
p k − r k ∈H k .
In other words, p k is the A-orthogonal projection of r k = r(x k ), parallel to
H k . It is actually an element of H k+1 ,since r k ∈H k+1 . It is also nonzero
since r k ∈H k .Wenotethat r k is orthogonal to H k with respect to the
usual scalar product, though p k is orthogonal to H k with respect to the
A-scalar product; this explains why p k and r k are generally diﬀerent.
If j ≤ k−2, we compute A(p k −r k ),p j
= − Ar k ,p j
= − r k ,Ap j
=0.
We have used successively the conjugation of the p k , the symmetry of A,
the fact that Ap j ∈H j+2 , and the orthogonality of r k and H k .Wehave
therefore p k − r k ⊥ AH k−1 ,so that
p k = r k + δ k p k−1 (9.7)
for a suitable number δ k .
9.5.2 Implementing the Conjugate Gradient
The main feature of the conjugate gradient is the simplicity of the com-
putation of the vectors x k , which is done by induction. To begin with, we
have p 0 = r 0 = Ax 0 − b,where x 0 is at our disposal. Let us assume now
that x k and p k−1 are known. Then r k = Ax k − b.If r k =0, we already
have the solution. Otherwise, the formulas (9.5, 9.7) show that in fact,
x k+1 minimizes J over the plane x k + IRr k ⊕ IRp k−1 . We therefore have
x k+1 = x k +α k r k +β k p k−1 , where the entries α k ,β k are obtained by solving
the linear system of two equations
2
α k Ar k ,r k
+ β k Ar k ,p k−1
+ r k =0,
α k Ar k ,p k−1
+ β k Ap k−1 ,p k−1
=0

(we have used r k ,p k−1
=0). Then we have δ k = β k /α k .Observe that α k
is nonzero, because otherwise β k wouldvanishand r k would too.
Summing up, the algorithm reads as follows

• Choose x 0 ; then deﬁne p 0 = r 0 = r(x 0 ):= Ax 0 − b.

175 176 177 178 179 180 181 182 183 184 185