Page 58 - Applied Probability
P. 58

41
                                                              3. Newton’s Method and Scoring
                              and covariance matrix Σ(θ)= Var[h(X)] of the sufficient statistic h(X).
                              Several authors [2, 3, 10] have noted the representations
                                                                        −1
                                                                    t
                                                                                           (3.3)
                                                     = [h(x) − µ(θ)] Σ(θ)
                                              dL(θ)
                                                                           dµ(θ)
                                                                  −1
                                                             t
                                               J(θ)
                                                                    dµ(θ),
                                                     = dµ(θ) Σ(θ)
                                                                                           (3.4)
                              where dµ(θ) is the matrix of partial derivatives of µ(θ). If the vector γ(θ)
                              in definition (3.2) is linear in θ, then
                                                                    2
                                                          2
                                                 J(θ)= −d L(θ)= −d β(θ),
                              and scoring coincides with Newton’s method.
                                Although we will not stop to derive the general formulas (3.3) and (3.4),
                              it is instructive to consider the special case of a multinomial distribution
                              with m trials and success probability p i for category i.If X =(X 1 ,...,X l ) t
                              denotes the random vector of counts and θ the model parameters, then the
                              loglikelihood of the observed data X = x is
                                                        l
                                                      	                   m
                                             L(θ)=        x i ln p i (θ)+ ln    ,
                                                                        x 1 ...x l
                                                       i=1
                              and consequently the score vector dL(θ) has entries
                                                              l
                                                 ∂           	    x i  ∂
                                                    L(θ)=               p i (θ).
                                                 ∂θ j            p i (θ) ∂θ j
                                                             i=1
                              Here θ j is the jth component of θ. Because  ∂    l  p i (θ)=  ∂  1 = 0, the
                                                                     ∂θ j  i=1      ∂θ j
                              expected information matrix J(θ) has entries
                                                            ∂
                                                             2
                                           J(θ) jk  =E −         L(θ)
                                                          ∂θ j ∂θ k
                                                       l
                                                      	         1    ∂      ∂
                                                  =      E(X i )       p i (θ)  p i (θ)
                                                                  2
                                                              p i (θ) ∂θ j  ∂θ k
                                                      i=1
                                                         l              2
                                                       	          1    ∂
                                                      −    E(X i )         p i (θ)
                                                                p i (θ) ∂θ j ∂θ k
                                                        i=1
                                                         l
                                                        	    1   ∂      ∂
                                                  = m              p i (θ)  p i (θ)        (3.5)
                                                           p i (θ) ∂θ j  ∂θ k
                                                        i=1
                                                           l    2
                                                          	    ∂
                                                      − m          p i (θ)
                                                             ∂θ j ∂θ k
                                                          i=1
                                                         l
                                                        	    1   ∂      ∂
                                                  = m              p i (θ)  p i (θ).
                                                           p i (θ) ∂θ j  ∂θ k
                                                        i=1
   53   54   55   56   57   58   59   60   61   62   63