Page 333 -

P. 333

Section 10.4 Robustness 301

2
1.8
1.6
2
y=x
1.4
1.2
1
0.1
0.8
1
0.6
10
0.4
0.2
0
-10 -8 -6 -4 -2 0 2 4 6 8 10
2
2
2
2
FIGURE 10.6: The function ρ(x; σ)= x /(σ + x ), plotted for σ =0.1, 1, and 10, with a
2
plot of y = x for comparison. Replacing quadratic terms with ρ reduces the inﬂuence of
outliers on a ﬁt. A point that is several multiples of σ away from the ﬁtted curve is going
to have almost no eﬀect on the coeﬃcients of the ﬁtted curve, because the value of ρ will
be close to 1 and will change extremely slowly with the distance from the ﬁtted curve.
squares and total least squares line-ﬁtting errors—which diﬀer only in the form of
2
the residual error—both have ρ(u; σ)= u . The trick to M-estimators is to make
2
ρ(u; σ) look like u for part of its range and then ﬂattens out; we expect that ρ(u; σ)
increases monotonically, and is close to a constant value for large u. A common
choice is
u 2
ρ(u; σ)= .
σ + u 2
2
The parameter σ controls the point at which the function ﬂattens out, and we have
plotted a variety of examples in Figure 10.6. There are many other M-estimators
available. Typically, they are discussed in terms of their inﬂuence function, which
is deﬁned as
∂ρ
.
∂θ
This is natural because our minimization criterion yields

∂ρ
ρ(r i (x i ,θ); σ) =0
∂θ
i
at the solution. For the kind of problems we consider, we would expect a good
inﬂuence function to be antisymmetric— there is no diﬀerence between a slight
overprediction and a slight underprediction—and to tail oﬀ with large values—
because we want to limit the inﬂuence of the outliers.
There are two tricky issues with using M-estimators. First, the minimization
problem is non-linear and must be solved iteratively. The standard diﬃculties
apply: there might be more than one local minimum, the method might diverge,
and the behavior of the method is likely to be quite dependent on the start point.

328 329 330 331 332 333 334 335 336 337 338