Page 333 -
P. 333

Section 10.4  Robustness  301


                                           2
                                          1.8
                                          1.6
                                                                     2
                                                                    y=x
                                          1.4
                                          1.2
                                           1
                                                                   0.1
                                          0.8
                                                                    1
                                          0.6
                                                                        10
                                          0.4
                                          0.2
                                           0
                                           -10  -8  -6  -4  -2  0   2   4   6   8   10
                                                               2
                                                           2
                                                                   2
                                                                               2
                            FIGURE 10.6: The function ρ(x; σ)= x /(σ + x ), plotted for σ =0.1, 1, and 10, with a
                                      2
                            plot of y = x for comparison. Replacing quadratic terms with ρ reduces the influence of
                            outliers on a fit. A point that is several multiples of σ away from the fitted curve is going
                            to have almost no effect on the coefficients of the fitted curve, because the value of ρ will
                            be close to 1 and will change extremely slowly with the distance from the fitted curve.
                            squares and total least squares line-fitting errors—which differ only in the form of
                                                                2
                            the residual error—both have ρ(u; σ)= u . The trick to M-estimators is to make
                                           2
                            ρ(u; σ) look like u for part of its range and then flattens out; we expect that ρ(u; σ)
                            increases monotonically, and is close to a constant value for large u. A common
                            choice is
                                                                   u 2
                                                        ρ(u; σ)=       .
                                                                σ + u 2
                                                                 2
                            The parameter σ controls the point at which the function flattens out, and we have
                            plotted a variety of examples in Figure 10.6. There are many other M-estimators
                            available. Typically, they are discussed in terms of their influence function, which
                            is defined as
                                                              ∂ρ
                                                                 .
                                                              ∂θ
                            This is natural because our minimization criterion yields

                                                                    ∂ρ
                                                        ρ(r i (x i ,θ); σ)  =0
                                                                    ∂θ
                                                      i
                            at the solution. For the kind of problems we consider, we would expect a good
                            influence function to be antisymmetric— there is no difference between a slight
                            overprediction and a slight underprediction—and to tail off with large values—
                            because we want to limit the influence of the outliers.
                                 There are two tricky issues with using M-estimators. First, the minimization
                            problem is non-linear and must be solved iteratively. The standard difficulties
                            apply: there might be more than one local minimum, the method might diverge,
                            and the behavior of the method is likely to be quite dependent on the start point.
   328   329   330   331   332   333   334   335   336   337   338