Page 192 - Computational Retinal Image Analysis
P. 192
5 Missingness of data 187
5 Missingness of data
5.1 Main mechanisms of data missingness
With the increase in the number of randomized clinical trials and cohort studies ana-
lyzing imaging data, even the most robustly designed study can have missing data
[27]. Missing data, or missing values or images, occur when no data value is col-
lected and stored for the variable in an observation e.g. eye or patient. Missing data
are a common problem in research and can have a negative impact on the conclusions
of the analysis. Here, we first discuss types of missingness and then the strategies
for dealing with missing data but what we hope emerges is a very clear message that
there is no ideal solution to missing data and prevention is the best strategy [18].
It is important to identify why data are missing. Missing data can occur because
of a nonresponse for the patient (e.g. the patient drops out of the study early), or
for some variables of a patient (e.g. patient is in the study but is not able to have
fluorescein angiography taken, or data are lost). There may be various reasons for
the missingness, sometimes missing values are caused by the researcher (e.g. when
the data collection is done improperly or mistakes are made in data entry) or by pa-
tient (e.g. patient refuses to report or have a measurement done, or patient leaves the
study). These reasons of missingness can have a different impact on the validity of
conclusions from research.
There are three main types of data missingness. Suppose we are studying vi-
sual acuity (Y) as a function of diabetes (X). Some patients wouldn’t have their vi-
sual acuity recorded, so you are missing some values for Y. There are three possible
mechanisms for the missingness:
• There may be no particular reason why some patients have Y recorded and others
didn’t. That is, the probability that Y is missing has no relationship to X or Y. Pattern
of missingness is independent of missing values and the values of any measured
variables. In this case our data is missing completely at random (MCAR).
• Those without diabetes may be less likely to have their visual acuity recorded.
That is, the probability that Y is missing depends only on the value of X. Such
data are missing at random (MAR).
• Those with good sight may be less likely to their visual acuity recorded. That
is, the probability that Y is missing depends on the unobserved value of Y itself.
Such data are not missing at random (MNAR).
5.2 Main strategies to tackle missing data
There are several strategies to analyze the dataset if missingness is present. Here we
highlight the main points:
• The most common approach to deal with missing data is to simply analyze
everyone with complete data only—an available case or complete case (CC)
analysis (see scenario 1 in Ref. [18]) where cases or subjects with missing data
are simply omitted from the analysis.