Page 187 - Computational Retinal Image Analysis
P. 187

182    CHAPTER 10  Statistics in ophthalmology




                         In ophthalmic research (as well as other medical areas) very often the researchers
                         declare that there is statistical difference (or significance) but they do not provide a
                         discussion on the size of the difference; or they find no statistically significant differ-
                         ence between treatments when in reality the study may simply lacked the power to
                         detect a difference [13]. Clinical trials today typically are powered at over 85% (ide-
                         ally 90%)—they cost a lot to run and are very time consuming and the last thing that
                         anyone wants is an inconclusive trial—there is no difference but it is possibly that
                         there was insufficient power to detect a difference that was of real clinical impact.
                         If ever a study declares non-significance, identify the research question, identify the
                         null hypothesis, compute the effect size with a 95% confidence interval—then con-
                         sider the implications of changing practice if in reality the truth is the upper or lower
                         bound of the confidence interval.
                            An example of this might be a trial exploring whether or not posturing is needed
                         for patients undergoing vitrectomy surgery. The null hypothesis here would be that
                         there is no difference between the risk of failure in patients posturing face down
                         after surgery and the risk of failure in patients not posturing after surgery. Suppose
                         a clinical trial is then conducted with 200 patients in each arm of the study. In the
                         face-down group one patient requires additional surgery because their macular hole
                         re-opens. In the non-posturing group, two patients require repeat surgery. The odds
                         ratio for this would be 2.02 with a confidence interval of (0.18–22.3). The P-value
                         would be 0.999. This is a statistically nonsignificant result but does that mean that
                         there is no requirement for patients to posture after surgery? There were twice as
                         many patients in the non-posturing group who required repeat surgery and if we look
                         at the confidence interval we see that although the data are consistent with there be-
                         ing no difference in risk between trial arms, there is much uncertainty attached to the
                         estimate as presented by a very wide confidence interval.
                            Another important issue is multiple hypothesis testing. When we conduct a test
                         of significance at the conventional 5% significance level we have a 1 in 20 chance
                         (or 0.05 probability) of concluding that there is significance when there is no real
                         difference and we have a 1–0.05 or 0.95 chance of concluding that there is no differ-
                         ence. If we conduct two tests of statistical significance the probability that neither
                         are statistically significant is 0.95 × 0.95 = 0.90. If we were to test 5 independent tests
                                                                                            5
                         of significance the probability that none are statistically significant would be 0.95
                         and the probability that at least one is significant is 0.226. This means that if many
                         tests of significance are conducted it is highly likely that there will be a spurious
                         statistically significant result. This is called the multiplicity issue or the problem of
                         multiple comparisons or the problem of multiple testing. To deal with this problem
                         there are statistical adjustments that can be applied. The one perhaps most widely
                         known is the Bonferroni adjustment which simply divides the P value of the signifi-
                         cance test by the number of tests conducted [14]. This adjustment has disadvantages
                         in that it over-corrects i.e. is very conservative if there are a large number of tests
                         or if the tests (more precisely, the test statistics) are positively correlated. This is
                         because the Bonferroni adjustment assumes that each test is independent of all other
                         tests. In MRI imaging research the random field theory is being used to  correct for
   182   183   184   185   186   187   188   189   190   191   192