Page 192 - Computational Retinal Image Analysis
P. 192

5  Missingness of data  187




                  5  Missingness of data

                  5.1  Main mechanisms of data missingness
                  With the increase in the number of randomized clinical trials and cohort studies ana-
                  lyzing imaging data, even the most robustly designed study can have missing data
                  [27]. Missing data, or missing values or images, occur when no data value is col-
                  lected and stored for the variable in an observation e.g. eye or patient. Missing data
                  are a common problem in research and can have a negative impact on the conclusions
                  of the analysis. Here, we first discuss types of missingness and then the strategies
                  for dealing with missing data but what we hope emerges is a very clear message that
                  there is no ideal solution to missing data and prevention is the best strategy [18].
                     It is important to identify why data are missing. Missing data can occur because
                  of a nonresponse for the patient (e.g. the patient drops out of the study early), or
                  for some variables of a patient (e.g. patient is in the study but is not able to have
                  fluorescein angiography taken, or data are lost). There may be various reasons for
                  the missingness, sometimes missing values are caused by the researcher (e.g. when
                  the data collection is done improperly or mistakes are made in data entry) or by pa-
                  tient (e.g. patient refuses to report or have a measurement done, or patient leaves the
                  study). These reasons of missingness can have a different impact on the validity of
                  conclusions from research.
                     There are three main types of data missingness. Suppose we are studying vi-
                  sual acuity (Y) as a function of diabetes (X). Some patients wouldn’t have their vi-
                  sual acuity recorded, so you are missing some values for Y. There are three possible
                  mechanisms for the missingness:
                  •  There may be no particular reason why some patients have Y recorded and others
                     didn’t. That is, the probability that Y is missing has no relationship to X or Y. Pattern
                     of missingness is independent of missing values and the values of any measured
                     variables. In this case our data is missing completely at random (MCAR).
                  •  Those without diabetes may be less likely to have their visual acuity recorded.
                     That is, the probability that Y is missing depends only on the value of X. Such
                     data are missing at random (MAR).
                  •  Those with good sight may be less likely to their visual acuity recorded. That
                     is, the probability that Y is missing depends on the unobserved value of Y itself.
                     Such data are not missing at random (MNAR).
                  5.2  Main strategies to tackle missing data

                  There are several strategies to analyze the dataset if missingness is present. Here we
                  highlight the main points:
                  •  The most common approach to deal with missing data is to simply analyze
                     everyone with complete data only—an available case or complete case (CC)
                     analysis (see scenario 1 in Ref. [18]) where cases or subjects with missing data
                     are simply omitted from the analysis.
   187   188   189   190   191   192   193   194   195   196   197