Page 187 - Computational Retinal Image Analysis
P. 187
182 CHAPTER 10 Statistics in ophthalmology
In ophthalmic research (as well as other medical areas) very often the researchers
declare that there is statistical difference (or significance) but they do not provide a
discussion on the size of the difference; or they find no statistically significant differ-
ence between treatments when in reality the study may simply lacked the power to
detect a difference [13]. Clinical trials today typically are powered at over 85% (ide-
ally 90%)—they cost a lot to run and are very time consuming and the last thing that
anyone wants is an inconclusive trial—there is no difference but it is possibly that
there was insufficient power to detect a difference that was of real clinical impact.
If ever a study declares non-significance, identify the research question, identify the
null hypothesis, compute the effect size with a 95% confidence interval—then con-
sider the implications of changing practice if in reality the truth is the upper or lower
bound of the confidence interval.
An example of this might be a trial exploring whether or not posturing is needed
for patients undergoing vitrectomy surgery. The null hypothesis here would be that
there is no difference between the risk of failure in patients posturing face down
after surgery and the risk of failure in patients not posturing after surgery. Suppose
a clinical trial is then conducted with 200 patients in each arm of the study. In the
face-down group one patient requires additional surgery because their macular hole
re-opens. In the non-posturing group, two patients require repeat surgery. The odds
ratio for this would be 2.02 with a confidence interval of (0.18–22.3). The P-value
would be 0.999. This is a statistically nonsignificant result but does that mean that
there is no requirement for patients to posture after surgery? There were twice as
many patients in the non-posturing group who required repeat surgery and if we look
at the confidence interval we see that although the data are consistent with there be-
ing no difference in risk between trial arms, there is much uncertainty attached to the
estimate as presented by a very wide confidence interval.
Another important issue is multiple hypothesis testing. When we conduct a test
of significance at the conventional 5% significance level we have a 1 in 20 chance
(or 0.05 probability) of concluding that there is significance when there is no real
difference and we have a 1–0.05 or 0.95 chance of concluding that there is no differ-
ence. If we conduct two tests of statistical significance the probability that neither
are statistically significant is 0.95 × 0.95 = 0.90. If we were to test 5 independent tests
5
of significance the probability that none are statistically significant would be 0.95
and the probability that at least one is significant is 0.226. This means that if many
tests of significance are conducted it is highly likely that there will be a spurious
statistically significant result. This is called the multiplicity issue or the problem of
multiple comparisons or the problem of multiple testing. To deal with this problem
there are statistical adjustments that can be applied. The one perhaps most widely
known is the Bonferroni adjustment which simply divides the P value of the signifi-
cance test by the number of tests conducted [14]. This adjustment has disadvantages
in that it over-corrects i.e. is very conservative if there are a large number of tests
or if the tests (more precisely, the test statistics) are positively correlated. This is
because the Bonferroni adjustment assumes that each test is independent of all other
tests. In MRI imaging research the random field theory is being used to correct for