Page 190 - Computational Retinal Image Analysis
P. 190
4 On choosing the right statistical analysis method 185
What is the response of the statistician to the question of choosing the right sta-
tistical method? A good statistician will respond with clarifying questions: “Can you
give me some clinical background? What is the goal of your research study and do
you have a hypothesis?” Here the statistician will aim to find out if your study is
exploratory (i.e. hypothesis generating), confirmatory (i.e. inferential or hypothesis
testing), diagnostic (including prognostic or predictive). Then the statistician will
follow with questions on how you designed the study and how you collected the
data. Ideally, however, a statistician would be part of the study already and would
have been involved in the decision making process when the study design was being
developed, and hence the statistician would not have to ask all these questions.
4.3 Words of caution in the data analysis method selection
Visualizing data is very important and underrated. In his famous Exploratory
Data Analysis, Tukey [16] wrote: “The greatest value of a picture is when it
forces us to notice what we never expected to see.” A misconception is that we
do not need the exploratory analysis if we are doing a confirmatory (inference)
study. Exploratory analysis (such as examining means and medians, histograms,
piecharts) are crucial for research. They are often termed as descriptive data
analysis methods (see Table 3). There are several reasons why we need them.
They help us to understand data, check for outliers, any expected or unexpected
patterns. They help us to create the demographics tables and summaries for the
reports. They help to verify the distribution of the data so that we can make in-
formed decision about the data analysis selection (Section 4). Furthermore, when
using a complex data analysis methods (e.g. adjusted logistic regression), it is
essential to understand the way the results agree, or disagree, with those from
simpler methods (e.g. unadjusted logistic regression). Therefore, when we write
a research report, we are obliged to include both: the results from the simple data
analysis methods as well as the complex methods, so that reviewers can judge the
consistency between the results.
Sophisticated statistical analytic techniques are rarely able to compensate for
deficiencies in data collection. A common misconception is that a flaw in the data
collection, in study design can be adjusted for via a complex fancy statistical data
analysis method. It is indeed true that an alternative data analysis technique may
be able to help avoid some difficulty, such as by adjusting for confounders when
analyzing data from observational study rather than controlling for confounders via
randomization or careful selection of subjects. However, there are many scenarios
where a complex statistical method will not help to rectify the flaws. For example
in a study of association between intraocular pressure and diabetes a potential con-
founder is systolic blood pressure. If we collect this confounder as a dichotomous
data (or if we do not collect it at all) then the estimated association between diabetes
and intraocular pressure will be underpowered and/or biased [17].
Often case control studies involve matching at the study design stage so as to
make comparator groups more similar to each other. It is important to note that if this