Page 334 -

P. 334

Section 10.4 Robustness 302

For s =1 to s = k
Draw a subset of r distinct points, chosen uniformly at random
Fit to this set of points using least squares to obtain an initial
set of parameters θ 0 s
0
Estimate σ using θ 0 s
s
n
Until convergence (usually |θ − θ n−1 | is small):
s
s
Take a minimizing step using θ s n−1 , σ n−1
s
to get θ n
s
Now compute σ s n
end
end
Report thebestﬁtofthis set of k trials, using the median of the residuals
as a criterion
Algorithm 10.3: Using an M-Estimator to Fit a Least Squares Model.

A common strategy for dealing with this problem is to draw a subsample of the
dataset, ﬁt to that subsample using least squares, and use this as a start point for
the ﬁtting process. We do this for a large number of diﬀerent subsamples, enough
to ensure that there is a high probability that there is at least one subsample that
consists entirely of good data points (Algorithm 10.3).
Second, as Figures 10.7 and 10.8 indicate, the estimators require a sensible
estimate of σ, which is often referred to as scale. Typically, the scale estimate is
supplied at each iteration of the solution method; a popular estimate of scale is

σ (n) =1.4826 median i |r (n) (x i ; θ (n−1) )| .
i
We summarize a general M-estimator in Algorithm 10.3.

10.4.2 RANSAC: Searching for Good Points
An alternative to modifying the cost function is to search the collection of data
points for good points. This is quite easily done by an iterative process: First, we
choose a small subset of points and ﬁt to that subset, and then we see how many
other points ﬁt to the resulting object. We continue this process until we have a
high probability of ﬁnding the structure we are looking for.
For example, assume that we are ﬁtting a line to a dataset that consists of
about 50% outliers. We can ﬁt a line to only two points. If we draw pairs of points
uniformly and at random, then about a quarter of these pairs will consist entirely
of good data points. We can identify these good pairs by noticing that a large
collection of other points lie close to the line ﬁtted to such a pair. Of course, a
better estimate of the line could then be obtained by ﬁtting a line to the points
that lie close to our current line.
Fischler and Bolles (1981) formalized this approach into an algorithm—search
for a random sample that leads to a ﬁt on which many of the data points agree.
The algorithm is usually called RANSAC, for RANdom SAmple Consensus, and is

329 330 331 332 333 334 335 336 337 338 339