Page 338 -
P. 338
Section 10.5 Fitting Using Probabilistic Models 306
is small.
10.5 FITTING USING PROBABILISTIC MODELS
It is straightforward to build probabilistic models from the fitting procedures we
have described. Doing so yields a new kind of model, and a new algorithm; both
are extremely useful in practice. The key is to view our observed data as having
been produced by a generative model. The generative model specifies how each
data point was produced.
In the simplest case, line fitting with least squares, we can recover the same
equations we worked with in Section 10.2.1 by using a natural generative model.
Our model for producing data is that the x coordinate is uniformly distributed
and the y coordinate is generated by (a) finding the point ax i + b on the line corre-
sponding to the x coordinate and then (b) adding a zero mean normally distributed
random variable. Now write x ∼ p to mean that x is a sample from the probability
distribution p;write U(R) for the uniform distribution over some range of values
2
2
R; and write N(μ, σ ) for the normal distribution with mean mu and variance σ .
With our notation, we can write:
x i ∼ U(R)
2
y i ∼ N(ax i + b, σ ).
We can estimate the unknown parameters of this model in a straightforward way.
The important parameters are a and b (though knowing σ might be useful). The
usual way to estimate parameters in a probabilistic model is to maximize the like-
lihood of the data, typically by working with the negative log-likelihood and mini-
mizing that. In this case, the log-likelihood of the data is
L(a, b, σ)= log P(x i ,y i |a, b, σ)
i∈data
= log P(y i |x i ,a,b,σ) + log P(x i )
i∈data
(y i − (ax i + b)) 2 1 2
= − − log 2πσ + K b
2σ 2 2
i∈data
where K b is a constant representing log P(x i ). Now, to minimize the negative log-
2
likelihood as a function of a and b we could minimize i∈data (y i − (ax i + b))
as a function of a and b (which is what we did for least-squares line fitting in
Section 10.2.1).
Now consider total least-squares line fitting. Again, we can recover the equa-
tions we worked with in Section 10.2.1 from a natural generative model. In this
case, to generate a data point (x i ,y i ), we generate a point (u i ,v i ) uniformly at ran-
dom along the line (or rather, along a finite length segment of the line likely to be of
2
interest to us), then sample a distance ξ i (where ξ i ∼ N(0,σ ), and move the point
(u i ,v i ) perpendicular to the line by that distance. If the line is ax + by + c =0
2
2
and if a + b =1, we have that (x i ,y i )= (u i ,v i )+ ξ i (a, b). We can write the