Page 58 - Applied Probability
P. 58
41
3. Newton’s Method and Scoring
and covariance matrix Σ(θ)= Var[h(X)] of the sufficient statistic h(X).
Several authors [2, 3, 10] have noted the representations
−1
t
(3.3)
= [h(x) − µ(θ)] Σ(θ)
dL(θ)
dµ(θ)
−1
t
J(θ)
dµ(θ),
= dµ(θ) Σ(θ)
(3.4)
where dµ(θ) is the matrix of partial derivatives of µ(θ). If the vector γ(θ)
in definition (3.2) is linear in θ, then
2
2
J(θ)= −d L(θ)= −d β(θ),
and scoring coincides with Newton’s method.
Although we will not stop to derive the general formulas (3.3) and (3.4),
it is instructive to consider the special case of a multinomial distribution
with m trials and success probability p i for category i.If X =(X 1 ,...,X l ) t
denotes the random vector of counts and θ the model parameters, then the
loglikelihood of the observed data X = x is
l
m
L(θ)= x i ln p i (θ)+ ln ,
x 1 ...x l
i=1
and consequently the score vector dL(θ) has entries
l
∂ x i ∂
L(θ)= p i (θ).
∂θ j p i (θ) ∂θ j
i=1
Here θ j is the jth component of θ. Because ∂ l p i (θ)= ∂ 1 = 0, the
∂θ j i=1 ∂θ j
expected information matrix J(θ) has entries
∂
2
J(θ) jk =E − L(θ)
∂θ j ∂θ k
l
1 ∂ ∂
= E(X i ) p i (θ) p i (θ)
2
p i (θ) ∂θ j ∂θ k
i=1
l 2
1 ∂
− E(X i ) p i (θ)
p i (θ) ∂θ j ∂θ k
i=1
l
1 ∂ ∂
= m p i (θ) p i (θ) (3.5)
p i (θ) ∂θ j ∂θ k
i=1
l 2
∂
− m p i (θ)
∂θ j ∂θ k
i=1
l
1 ∂ ∂
= m p i (θ) p i (θ).
p i (θ) ∂θ j ∂θ k
i=1