Page 107 - Applied Statistics Using SPSS, STATISTICA, MATLAB and R
P. 107

86       3 Estimating Data Parameters


           interval, following  the procedure explained in the  previous  section, is  now
           computed as:

              x −  z 1 α  2 /  σ /  n <  µ <  x +  z 1 α  2 /  σ /  n .     3.9
                                     −
                   −

              As  shown  in Figure 3.3, with  increasing  n, the  distribution of  X  gets  more
           peaked; therefore, the confidence intervals decrease with  n (the precision of our
           estimates of the mean increase). This is precisely why computing averages is so
           popular!
              In normal practice one does not know the exact value of σ, using the previously
           mentioned (2.3.2) point estimate s instead. In this case, the sampling distribution is
           not the  normal distribution  any more. However, taking  into account Property  3
           described in section B.2.8, the following random variable:

                    X − µ
              T n 1  =   ,
                −
                    s /  n

           has  a  Student’s  t distribution  with  df =  n  – 1 degrees  of freedom. The sample
           standard deviation of  X ,  s /  n , is known as the  standard error of  the
           statistic x and denoted SE.
                                                             ’
              We  now compute the  1−α/2  percentile  for  the  Student s  t  distribution  with
           df = n – 1degrees of freedom:

                       −
              T n  1 −  (t ) = 1 α  2 /  ⇒ t df  1 , α  2 /  ,             3.10
                                     −

           and use this percentile in order to establish the two-sided confidence interval:

                         x − µ
              −t       <     < t      ,                                    3.11
                df
                     2 /
                  1 , α
                   −
                                  −
                          SE    df  1 , α  2 /

           or, equivalently:

              x  t −  df 1 α  2 /  SE < µ <  x  t +  df 1 α  2 /  SE .     3.12
                                     −
                    −
                   ,
                                    ,

                            ’
              Since the Student s t distribution is less peaked than the normal distribution, one
           obtains larger  intervals  when using formula 3.12 than  when  using  formula 3.9,
           reflecting the added uncertainty about the true value of the standard deviation.
              When applying these results one must note that:

              –  For large  n, the Central Limit theorem (see sections  A.8.4 and A.8.5)
                 legitimises the assumption of normal distribution of  X even when X is not
                 normally distributed (under very general conditions).
              –  For large n, the Student s t distribution does not deviate significantly from
                                    ’
                 the normal distribution, and  one can then  use,  for  unknown  σ, the same
                 percentiles derived from the  normal distribution, as  one  would use in the
                 case of known σ.
   102   103   104   105   106   107   108   109   110   111   112