Page 38 - Applied Statistics Using SPSS, STATISTICA, MATLAB and R
P. 38

1.6 Statistical Significance and Other Significances   17


           because it usually achieves a “reasonable” tolerance  in  our  conclusions  (say,
           ε  < 0.05) for a not too large sample size (say, n > 200), and it works well in many
           applications.  For some problem types, where a high risk can  have serious
           consequences, one would then choose a higher confidence level, 99% for example.
           Notice that arbitrarily small risks (arbitrarily small “reasonable doubt”) are often
           impractical. As a matter of fact, a zero risk − no “doubt” at all − means, usually,
           either an infinitely  large, useless, tolerance, or an infinitely large, prohibitive,
           sample. A compromise value achieving  a  useful tolerance  with an  affordable
           sample size has to be found.


           1.6 Statistical Significance and Other Significances


           Statistics is surely a recognised and powerful  data analysis tool. Because  of its
           recognised power and its pervasive influence in science and human affairs people
           tend to look to statistics as some sort of recipe book, from where one can pick up a
           recipe for the problem at hand. Things get worse when using statistical software
           and particularly in inferential data analysis. A lot of papers and publications are
           plagued with the  “computer  dixit” syndrome when  reporting statistical results.
           People tend to lose any critical sense even in such a risky endeavour as trying to
           reach a general conclusion (law) based  on a data sample: the inferential or
           inductive reasoning.
              In the book of A. J. Jaffe and Herbert F. Spirer (Jaffe AJ, Spirer HF 1987) many
           misuses of statistics are presented and discussed in detail. These authors identify
           four common sources of misuse: incorrect or flawed data; lack of knowledge of the
           subject matter; faulty,  misleading,  or imprecise interpretation of the data and
           results; incorrect or inadequate analytical  methodology.  In the present book we
           concentrate on how to choose adequate analytical methodologies and give precise
           interpretation of the results. Besides theoretical explanations and words of caution
           the book includes a large number of examples that in our opinion help to solidify
           the notions of adequacy and of precise interpretation of the data and the results.
           The other two sources of misuse  − flawed data and lack of knowledge of the
           subject matter – are the responsibility of the practitioner.
              In  what concerns statistical inference  the reader must exert extra care  of  not
           applying statistical methods in a mechanical and mindless way, taking or using the
           software results uncritically.  Let us consider as an example  the comparison of
           foetal heart rate baseline measurements proposed in Exercise 4.11. The heart rate
           “baseline” is  roughly the most stable  heart rate  value  (expressed in  beats per
           minute, bpm), after discarding rhythm acceleration or deceleration episodes. The
           comparison proposed in Exercise 4.11 respects to measurements obtained in 1996
           against those  obtained in  other years  (CTG  dataset samples). Now, the popular
           two-sample  t-test presented in chapter 4  does not detect a statiscally significant
           diference between the means of the measurements performed in 1996 and those
           performed in other years. If a statistically significant diference was detected did it
           mean that the 1996  foetal population was different, in that respect,  from the
   33   34   35   36   37   38   39   40   41   42   43