Page 304 - Probability and Statistical Inference
P. 304

6

                           Sufficiency, Completeness, and

                           Ancillarity



                           6.1 Introduction


                           Sir Ronald Aylmer Fisher published several path-breaking articles in the 1920’s
                           which laid the foundation of statistical inference. Many fundamental concepts
                           and principles of statistical inference originated in the works of Fisher. The
                           most exciting thing about these concepts is that these are still alive, well, and
                           indispensable. Perhaps the deepest of all statistical concepts and principles is
                           what is known as sufficiency. The concept of sufficiency originated from
                           Fisher (1920) and later it blossomed further, again in the hands of Fisher
                           (1922). First we introduce the notion of sufficiency which helps in summa-
                           rizing data without any loss of information.
                           Consider a scenario like this. From past experience, suppose that a market
                           analyst postulates the monthly income per household in a small town to be
                           normally distributed with the unknown population mean µ and population
                           standard deviation σ = $800. In order to guess the unknown µ, twenty one
                           households are randomly selected, independently of each other, from the popu-
                           lation, leading to the observations X  = x , X  = x , ..., X  = x . At this point,
                                                                    2
                                                                          21
                                                             1
                                                                2
                                                         1
                                                                              21

                           the market analyst may be debating between the appropriateness of using ,
                           the observed value of the sample mean    , as the guess or using x 21:11 , the
                           observed value of the sample median X 21:11  instead. Now, the question is this:
                           which guess should the market analyst use in this situation? Since the income
                           distribution is assumed normal,     should be used because      is sufficient for
                           µ as we will see later. On the other hand, the sample median, X 21:11  is not
                           sufficient for µ. Once we develop the idea of sufficiency in Section 6.2, it will
                           be clear that the summary obtained via      preserves all the information con-
                           tained in the whole data X = (X , ..., X ), whereas in the alternative summary
                                                           21
                                                     1
                           obtained via X 21:11 , some information from the data X will be lost. The com-
                           mon phrases such as the estimator, statistic, information, and sufficiency
                           would all be defined shortly.
                              Section 6.2 includes two ways to find sufficient statistics in a statistical
                           model. The first method involves the direct calculation of the conditional
                           distribution of the data given the value of a particular statistic, while the
                                                          281
   299   300   301   302   303   304   305   306   307   308   309