Page 191 - Computational Retinal Image Analysis

P. 191

186 CHAPTER 10 Statistics in ophthalmology

is captured within the design the analysis must reflect the design by use of conditional
logistic regression or another analysis which allows for the preservation of matching.
A special caution needs to be given to parametric and non-parametric methods. As
we said before the statistical methods use probabilistic models to express the way reg-
ularity and variation in the data are to be understood. The variation can be expressed
in two ways: we can use a methods that assumes a particular distribution of data (e.g.
normal) with a finite number of parameters (two parameters in case of univariate
normal distribution) which would be a parametric method. The second possibility to
express the variation in data is by not imposing any assumptions on the distribution of
the data, hence there are no parameters involved, hence a non-parametric method. For
example if we compare means using an independent two sample t-test, then such ap-
proach is a parametric approach as it assumes that the data are normally distributed i.e.
that the noise is normally distributed with two parameters (mean zero and unknown
variance). An alternative, nonparametric method would be to use Mann-Whitney test
(Table 3). When to use parametric methods and when non-parametric methods? There
are two principles to remember. Firstly, where assumptions of parametric methods are
plausible, possibly after a transformation of data, parametric methods are preferable
providing extra power and allowing adjustment for other factors [19]. For example
for a comparison of two samples, if data are normal, or normal after a log or square
root transformation, we can use two-sample t-test, otherwise Mann-Whitney test may
be more appropriate [20]. An alternative to non-parametric methods is given by boot-
strapping or resampling, but such methods should not be considered without reference
to a statistician [21]. Secondly, it is important to remember that the parametric meth-
ods are often not robust to outliers while the non-parametric methods are in general
robust. Outliers may be more often identified in smaller samples, small samples are
hard or impossible to check for normality, which leads to a misconception that non-
parametric methods are for small samples. This is indeed wrong.
There are several other important concepts that are often misunderstood. For
completeness, we will briefly list them here and we will also give references for
further reading. One concept that is often misunderstood is the distinction between
multivariable vs multivariate methods [22]. Next, in the data design with data at
baseline and follow-up we need to be careful to incorporate the baseline values [23].
Next, a caution is needed for adjusting for the confounders. We need to be careful
about the criteria for control or confounding [24]. Another area of misunderstanding
relates to the choice of the statistical model. In the choice of the model, we need to
be careful what the goal of the analysis is. Often a goal is either to do the inference
(e.g. explain) or to estimate the future values of the outcome (e.g. predict). These
two goals may seem similar, but there are several differences that translates to how
a model should be chosen [25]. Another commonly misunderstood area is in using
the correlation coefficient for evaluation of the agreement [2, 26]. Final caution is for
all of us to remember, “All models are wrong but some are useful” by George Box.
His point was that we should focus more on finding a model that is useful to answer
a particular real life question; rather than focusing on finding a model that will be
correct in all scenarios and that will answer all the questions.

186 187 188 189 190 191 192 193 194 195 196