Page 368 - Probability and Statistical Inference
P. 368
7. Point Estimation 345
2
η and to come up with an estimator of σ . From the Examples 7.2.4-7.2.5,
1
it is clear that the method of moments may lead to estimators which will
depend upon non-sufficient statistics. Next, on top of this, if we face situa-
tions where theoretical moments are infinite, we can not hope to apply this
method. R. A. Fisher certainly realized the pitfalls of this methodology and
started criticizing Karl Pearsons way of finding estimators early on. Fisher
(1912) was critical on Pearsons approach of curve fitting and wrote on page
54 that The method of moments ... though its arbitrary nature is apparent
and went on to formulate the method of maximum likelihood in the same
paper. Fishers preliminary ideas took concrete shapes in a path-breaking ar-
ticle appearing in 1922 and followed by more elaborate discussions laid out in
Fisher (1925a, 1934).
Consider X , ..., X which are iid with the common pmf or pdf f(x; θ)
n
1
where x ∈ χ ⊆ ℜ and θ = (θ , ..., θ ) ∈ Θ ⊆ ℜ . Here θ , ..., θ are all assumed
k
k
k
1
1
unknown and thus θθ θθ θ is an unknown vector valued parameter. Recall the no-
tion of a likelihood function defined in (6.2.4). Having observed the data X =
x, we write down the likelihood function
Note that the observed data x = (x , ..., x ) is arbitrary but otherwise held
1
n
fixed.
Throughout this chapter and the ones that follow, we essentially
pay attention to the likelihood function when it is positive.
Definition 7.2.2 The maximum likelihood estimate of θθ θθ θ is the value
for which The maximum likelihood estimator
(MLE) of θ is denoted by If we write the context will dictate
whether it is referring to an estimate or an estimator of θθ θθ θ.
When the Xs are discrete, L(θθ θθ θ) stands for P {X = x}, that is the probabil-
θ
ity of observing the type of data on hand when ? is the true value of the
unknown parameter. The MLE is interpreted as the value of θθ θθ θ which maxi-
mizes the chance of observing the particular data we already have on hand.
Instead when the Xs are continuous, a similar interpretation is given by re-
placing the probability statement with an analogous statement using the joint
pdf of X.
As far as the definition of MLE goes, there is no hard and fast dictum
regarding any specific mathematical method to follow in order to locate
where L(θθ θθ θ) attains its supremum. If L(θθ θθ θ) is a twice differentiable function
of θθ θθ θ, then we may apply the standard techniques from differential calculus