Page 39 - Applied Statistics Using SPSS, STATISTICA, MATLAB and R
P. 39

18       1 Introduction


           population of other years? Common sense (and other senses as well) rejects such a
           claim. If a statistically significant difference was detected one should look
           carefully to the conditions presiding the data collection: can the samples be
           considered as  being random?; maybe the  1996 sample was collected  in at-risk
           foetuses with lower baseline measurements; and so on. As a matter of fact, when
           dealing with large samples even a small compositional difference may sometimes
           produce statistically significant results.  For instance,  for  the sample sizes of the
           CTG dataset  even a difference as small  as 1  bpm produces a result usually
           considered as statistically significant (p = 0.02). However, obstetricians only attach
           practical  meaning to rhythm differences above 5 bpm;  i.e., the statistically
           significant difference of 1 bpm has no practical significance.
              Inferring causality from data is even  a riskier endeavour than  simple
           comparisons. An often encountered example is the inference of causality from a
           statistically significant but spurious correlation. We give more details on this issue
           in section 4.4.1.
              One must also  be  very careful  when performing goodness  of fit tests. A
           common example of this is the normality assessment of a data distribution. A vast
           quantity of papers can be found where the authors conclude the normality of data
           distributions based on very small samples. (We have found a paper presented in a
           congress where the authors claimed the normality of a data distribution based on a
           sample of four cases!) As explained in detail in section 5.1.6, even with 25-sized
           samples one would  often be wrong when admitting that a data distribution is
                                          ’
           normal because a statistical test didn t reject that possibility at a 95% confidence
           level. More: one would often be accepting the normality of data generated with
           asymmetrical and even bimodal distributions!  Data  distribution modelling is a
           difficult problem that usually requires large samples and even so one must bear in
           mind that most of the times and beyond a reasonable doubt one only has evidence
           of a model; the true distribution remains unknown.
              Another misuse of inferential statistics arrives in the assessment of classification
           or regression models. Many people when designing a classification or regression
           model that performs very well in a training set (the set used in the design) suffer
           from a kind of love-at-first-sight syndrome that leads to neglecting or relaxing the
           evaluation of their models in test sets (independent of the training sets). Research
           literature is full with examples of improperly validated  models that are later on
           dropped out when more data becomes available and the initial optimism plunges
           down. The love-at-first-sight is even stronger when using computer software that
           automatically searches for the best set of variables describing the model. The book
           of Chamont Wang (Wang C, 1993), where many illustrations and words of caution
           on the topic of inferential statistics can be found, mentions an experiment where 51
           data samples were generated  with  100  random numbers each and a  regression
           model was searched for “explaining” one of the data samples (playing the role of
           dependent variable) as a function of the other ones (playing the role of independent
           variables).  The search finished  by finding  a regression  model with a  significant
           R-square and six significant coefficients at 95% confidence level. In other words, a
           functional model  was  found explaining a  relationship between noise and noise!
           Such a model would collapse had proper validation been applied. In the present
   34   35   36   37   38   39   40   41   42   43   44