Page 108 - Applied Statistics Using SPSS, STATISTICA, MATLAB and R
P. 108

3.2 Estimating a Mean   87


              There are  several values  of  n in the literature that are considered “large”,
           from 20 to 30. In what concerns the normality assumption of X , the value n = 20 is
           usually enough.  As to the deviation  between  z 1−α/2 and  t 1−α/2  it  is  about  5%  for
           n = 25 and α = 0.05. In the sequel, we will use the threshold n = 25 to distinguish
           small samples from large samples. Therefore, when estimating a mean we adopt
           the following procedure:

              1.  Large sample (n ≥ 25): Use formulas 3.9 (substituting σ by s) or 3.12 (if
                 improved accuracy is needed). No normality assumption of X is needed.
              2.  Small sample  (n  <  25) and  population  distribution can  be assumed to be
                 normal: Use formula 3.12.

              For simplicity most of the software products use formula 3.12 irrespective of the
           values of  n (for  small  n the normality assumption  has to  be checked  using the
           goodness of fit tests described in section 5.1).

           Example 3.1
           Q: Consider the data relative to the variable PRT for the first class (CLASS=1) of
           the  Cork Stoppers’ dataset. Compute the 95% confidence interval of its
           mean.

           A: There are n = 50 cases. The sample mean and sample standard deviation are
            x = 365 and s = 110, respectively. The standard error is SE =  s /  n = 15.6. We
           apply formula 3.12, obtaining the confidence interval:

              x  ±  t 49  . 0 ,  975  ×SE =  x  ±  2.01×15.6 = 365 ± 31.

              Notice that this confidence interval corresponds to a tolerance of 31/365 ≈ 8%.
           If we used in this large sample situation the normal approximation formula 3.9 we
           would obtain a very close result.
              Given the interpretation of confidence interval (sections 3.1 and 1.5) we expect
           that in a large number of  repetitions of  50  PRT measurements, in the same
           conditions used for the presented dataset, the respective confidence intervals such
           as the one we have derived will cover the true PRT mean 95% of the times. In
           other words, when presenting  [334, 396] as a confidence interval for the  PRT
           mean, we are incurring only on a 5% risk of being wrong by basing our estimate on
           an atypical dataset.

           Example 3.2
           Q: Consider the subset of the previous PRT data constituted by the first n = 20
           cases. Compute the 95% confidence interval of its mean.
           A: The sample mean and sample standard deviation are now  x = 351 and s = 83,
           respectively. The standard error is SE =  s /  n = 18.56. Since n = 20, we apply the
           small sample estimate formula 3.12 assuming that the PRT distribution can be well
   103   104   105   106   107   108   109   110   111   112   113