Page 307 - Probability and Statistical Inference
P. 307

284    6. Sufficiency, Completeness, and Ancillarity

                                 µ)/σ, is not a statistic because it involves the unknown parameter µ, and
                                 hence its value associated with any observed data x , ..., x  can not be calcu-
                                                                                  n
                                                                             1
                                 lated.
                                    Definition 6.2.2 A real valued statistic T is called sufficient (for the un-
                                 known parameter θ) if and only if the conditional distribution of the random
                                 sample X = (X , ..., X ) given T = t does not involve θ, for all t ∈   , the
                                                    n
                                              1
                                 domain space for T.
                                    In other words, given the value t of a sufficient statistic T, conditionally
                                 there is no more “information” left in the original data regarding the unknown
                                 parameter θ. Put another way, we may think of X trying to tell us a story
                                 about θ, but once a sufficient summary T becomes available, the original
                                 story then becomes redundant. Observe that the whole data X is always suf-
                                 ficient for θ in this sense. But, we are aiming at a “shorter” summary statistic
                                 which has the same amount of information available in X. Thus, once we find
                                 a sufficient statistic T, we will focus only on the summary statistic T. Before
                                 we give other details, we define the concept of joint sufficiency of a vector
                                 valued statistic T for an unknown parameter θ.
                                    Definition 6.2.3 A vector valued statistic T ≡ (T , ..., T ) where T  ≡ T (X ,
                                                                                              i
                                                                                          i
                                                                                  k
                                                                                                1
                                                                            1
                                 ..., X ), i = 1, ..., k, is called jointly sufficient (for the unknown parameter θ)
                                     n
                                 if and only if the conditional distribution of X = (X , ..., X ) given T = t does
                                                                                  n
                                                                            1
                                                             k
                                 not involve θ, for all t ∈    ⊆ ℜ .
                                    The Section 6.2.1 shows how the conditional distribution of X given T =
                                 t can be evaluated. The Section 6.2.2 provides the celebrated Neyman factor-
                                 ization which plays a fundamental role in locating sufficient statistics.
                                 6.2.1   The Conditional Distribution Approach
                                 With the help of examples, we show how the Definition 6.2.2 can be applied
                                 to find sufficient statistics for an unknown parameter θ.
                                    Example 6.2.1 Suppose that X , ..., X  are iid Bernoulli(p), where p is the
                                                                   n
                                                              1
                                 unknown parameter, 0 < p < 1. Here, χ = {0, 1}, θ = p, and Θ = (0, 1). Let us
                                 consider the specific statistic    . Its values are denoted by t ∈    =
                                 {0, 1, 2, ..., n}. We verify that T is sufficient for p by showing that the
                                 conditional distribution of (X , ..., X ) given T = t does not involve p, what-
                                                                n
                                                          1

                                 ever be t ∈ . From the Examples 4.2.2-4.2.3, recall that T has the Binomial(n,
                                 p) distribution. Now, we obviously have:
                                 But, when         , since                 is a subset of B = {T = t},
   302   303   304   305   306   307   308   309   310   311   312