Page 304 - Probability and Statistical Inference
P. 304
6
Sufficiency, Completeness, and
Ancillarity
6.1 Introduction
Sir Ronald Aylmer Fisher published several path-breaking articles in the 1920s
which laid the foundation of statistical inference. Many fundamental concepts
and principles of statistical inference originated in the works of Fisher. The
most exciting thing about these concepts is that these are still alive, well, and
indispensable. Perhaps the deepest of all statistical concepts and principles is
what is known as sufficiency. The concept of sufficiency originated from
Fisher (1920) and later it blossomed further, again in the hands of Fisher
(1922). First we introduce the notion of sufficiency which helps in summa-
rizing data without any loss of information.
Consider a scenario like this. From past experience, suppose that a market
analyst postulates the monthly income per household in a small town to be
normally distributed with the unknown population mean µ and population
standard deviation σ = $800. In order to guess the unknown µ, twenty one
households are randomly selected, independently of each other, from the popu-
lation, leading to the observations X = x , X = x , ..., X = x . At this point,
2
21
1
2
1
21
the market analyst may be debating between the appropriateness of using ,
the observed value of the sample mean , as the guess or using x 21:11 , the
observed value of the sample median X 21:11 instead. Now, the question is this:
which guess should the market analyst use in this situation? Since the income
distribution is assumed normal, should be used because is sufficient for
µ as we will see later. On the other hand, the sample median, X 21:11 is not
sufficient for µ. Once we develop the idea of sufficiency in Section 6.2, it will
be clear that the summary obtained via preserves all the information con-
tained in the whole data X = (X , ..., X ), whereas in the alternative summary
21
1
obtained via X 21:11 , some information from the data X will be lost. The com-
mon phrases such as the estimator, statistic, information, and sufficiency
would all be defined shortly.
Section 6.2 includes two ways to find sufficient statistics in a statistical
model. The first method involves the direct calculation of the conditional
distribution of the data given the value of a particular statistic, while the
281