Page 368 - Probability and Statistical Inference
P. 368

7. Point Estimation  345

                                                               2
                           η  and    to come up with an estimator of σ . From the Examples 7.2.4-7.2.5,
                            1
                           it is clear that the method of moments may lead to estimators which will
                           depend upon non-sufficient statistics. Next, on top of this, if we face situa-
                           tions where theoretical moments are infinite, we can not hope to apply this
                           method. R. A. Fisher certainly realized the pitfalls of this methodology and
                           started criticizing Karl Pearson’s way of finding estimators early on. Fisher
                           (1912) was critical on Pearson’s approach of curve fitting and wrote on page
                           54 that “The method of moments ... though its arbitrary nature is apparent”
                           and went on to formulate the method of maximum likelihood in the same
                           paper. Fisher’s preliminary ideas took concrete shapes in a path-breaking ar-
                           ticle appearing in 1922 and followed by more elaborate discussions laid out in
                           Fisher (1925a, 1934).
                              Consider X , ..., X  which are iid with the common pmf or pdf f(x; θ)
                                              n
                                       1
                           where x ∈  χ   ⊆  ℜ and θ = (θ , ..., θ ) ∈ Θ  ⊆  ℜ . Here θ , ..., θ  are all assumed
                                                                 k
                                                        k
                                                                              k
                                                                        1
                                                  1
                           unknown and thus θθ θθ θ is an unknown vector valued parameter. Recall the no-
                           tion of a likelihood function defined in (6.2.4). Having observed the data X =
                           x, we write down the likelihood function
                           Note that the observed data x = (x , ..., x ) is arbitrary but otherwise held
                                                         1
                                                               n
                           fixed.
                                Throughout this chapter and the ones that follow, we essentially
                                   pay attention to the likelihood function when it is positive.
                              Definition 7.2.2 The maximum likelihood estimate of  θθ θθ θ is the value

                                     for which                The maximum likelihood estimator
                           (MLE) of θ is denoted by       If we write     the context will dictate
                           whether it is referring to an estimate or an estimator of θθ θθ θ.
                              When the X’s are discrete, L(θθ θθ θ) stands for P {X = x}, that is the probabil-
                                                                  θ
                           ity of observing the type of data on hand when ? is the true value of the
                           unknown parameter. The MLE is interpreted as the value of θθ θθ θ which maxi-
                           mizes the chance of observing the particular data we already have on hand.
                           Instead when the X’s are continuous, a similar interpretation is given by re-
                           placing the probability statement with an analogous statement using the joint
                           pdf of X.
                              As far as the definition of MLE goes, there is no hard and fast dictum
                           regarding any specific mathematical method to follow in order to locate
                           where L(θθ θθ θ) attains its supremum. If L(θθ θθ θ) is a twice differentiable function
                           of θθ θθ θ, then we may apply the standard techniques from differential calculus
   363   364   365   366   367   368   369   370   371   372   373