Page 38 - Applied Statistics Using SPSS, STATISTICA, MATLAB and R
P. 38
1.6 Statistical Significance and Other Significances 17
because it usually achieves a “reasonable” tolerance in our conclusions (say,
ε < 0.05) for a not too large sample size (say, n > 200), and it works well in many
applications. For some problem types, where a high risk can have serious
consequences, one would then choose a higher confidence level, 99% for example.
Notice that arbitrarily small risks (arbitrarily small “reasonable doubt”) are often
impractical. As a matter of fact, a zero risk − no “doubt” at all − means, usually,
either an infinitely large, useless, tolerance, or an infinitely large, prohibitive,
sample. A compromise value achieving a useful tolerance with an affordable
sample size has to be found.
1.6 Statistical Significance and Other Significances
Statistics is surely a recognised and powerful data analysis tool. Because of its
recognised power and its pervasive influence in science and human affairs people
tend to look to statistics as some sort of recipe book, from where one can pick up a
recipe for the problem at hand. Things get worse when using statistical software
and particularly in inferential data analysis. A lot of papers and publications are
plagued with the “computer dixit” syndrome when reporting statistical results.
People tend to lose any critical sense even in such a risky endeavour as trying to
reach a general conclusion (law) based on a data sample: the inferential or
inductive reasoning.
In the book of A. J. Jaffe and Herbert F. Spirer (Jaffe AJ, Spirer HF 1987) many
misuses of statistics are presented and discussed in detail. These authors identify
four common sources of misuse: incorrect or flawed data; lack of knowledge of the
subject matter; faulty, misleading, or imprecise interpretation of the data and
results; incorrect or inadequate analytical methodology. In the present book we
concentrate on how to choose adequate analytical methodologies and give precise
interpretation of the results. Besides theoretical explanations and words of caution
the book includes a large number of examples that in our opinion help to solidify
the notions of adequacy and of precise interpretation of the data and the results.
The other two sources of misuse − flawed data and lack of knowledge of the
subject matter – are the responsibility of the practitioner.
In what concerns statistical inference the reader must exert extra care of not
applying statistical methods in a mechanical and mindless way, taking or using the
software results uncritically. Let us consider as an example the comparison of
foetal heart rate baseline measurements proposed in Exercise 4.11. The heart rate
“baseline” is roughly the most stable heart rate value (expressed in beats per
minute, bpm), after discarding rhythm acceleration or deceleration episodes. The
comparison proposed in Exercise 4.11 respects to measurements obtained in 1996
against those obtained in other years (CTG dataset samples). Now, the popular
two-sample t-test presented in chapter 4 does not detect a statiscally significant
diference between the means of the measurements performed in 1996 and those
performed in other years. If a statistically significant diference was detected did it
mean that the 1996 foetal population was different, in that respect, from the