Page 305 - Probability and Statistical Inference
P. 305

282    6. Sufficiency, Completeness, and Ancillarity

                                 second approach consists of the classical Neyman factorization of a likeli-
                                 hood function. We include specific examples to highlight some approaches to
                                 verify whether a statistic is or is not sufficient.
                                    In Section 6.3, the notion of minimal sufficiency is introduced and a fun-
                                 damental result due to Lehman and Scheffé (1950) is discussed. This result
                                 helps us in locating, in some sense, the best sufficient statistic, if it exists. We
                                 had seen in Section 3.8 that many standard statistical models such as the
                                 binomial, Poisson, normal, gamma and several others belong to an exponen-
                                 tial family. It is often a simple matter to locate the minimal sufficient statistic
                                 and its distribution in an exponential family. One gets a glimpse of this in the
                                 Theorems 6.3.3-6.3.4.
                                    The Section 6.4 provides the idea of quantifying information in both one-
                                 and two-parameter situations, but we do so in a fairly elementary fashion. By
                                 means of examples, we show that the information contained in the whole data
                                 is indeed preserved by the sufficient statistics. In the one-parameter case, we
                                 compare the information content in a non-sufficient statistic with that in the
                                 data and find the extent of the lost information if a non-sufficient statistic is
                                 used as a summary.
                                    The topic of ancillarity is discussed in Section 6.5, again moving deeper
                                 into the concepts and highlighting the fact that ancillary statistics can be use-
                                 ful in making statistical inferences. We include the location, scale, and loca-
                                 tion-scale families of distributions in Section 6.5.1. The Section 6.6 intro-
                                 duces the concept of completeness and discusses some of the roles complete
                                 sufficient statistics play within the realm of statistical inference. Section 6.6.2
                                 highlights Basu’s Theorem from Basu (1955a).

                                 6.2 Sufficiency



                                 Suppose that we start a statistical investigation with observable iid random
                                 variables X , ..., X , having a common pmf or pdf f(x), x ∈ χ, the domain
                                           1
                                                 n
                                 space for x. Here, n is the sample size which is assumed known. Practically
                                 speaking, we like to think that we are going to observe X , ..., X  from a
                                                                                    1
                                                                                          n
                                 population whose distribution is approximated well by f(x). In the example
                                 discussed in the introduction, the market analyst is interested in the income
                                 distribution of households per month which is denoted by f(x), with some
                                 appropriate space χ for x. The income distribution may be indexed by some
                                 parameter (or parameter vector) θ (or θθ θθ θ) which captures important features
                                 of the distribution. A practical significance of indexing with the parameter θ
                                 (or θθ θθ θ) is that once we know the value of θ (or θθ θθ θ), the population distribution
                                 f(x) would then be completely specified.
   300   301   302   303   304   305   306   307   308   309   310