Page 180 - Computational Retinal Image Analysis
P. 180

2  Data classification, data capture and data management  175




                  of analysis. If observations come from different individuals they may be regarded as
                  independent. If however there is a relationship between observations—for example
                  the intraocular pressure in a glaucomatous eye pre- and post-delivery of eye drops, or
                  indeed the intraocular pressure of the right and left eyes of the individual, the data are
                  not independent. Within imaging data there may be several thousand different values
                  measured on a single eye yielding data that are not independent. It is therefore impor-
                  tant that the statistical technique used to explore such data addresses such potential
                  nonindependence as well as addresses the unit of analysis [12].
                     A further issue to consider is whether or not a value being analyzed is an ac-
                  tual measurement or whether it is actually a summary score that represents some
                  pre-processing of data. If the later has occurred it is necessary to know how the
                  pre-processing has been done. Failure to do this may result in spurious associations
                  between variables being seen (see e.g. ocular perfusion and intraocular pressure in
                  Refs. [5, 6]). Measuring devices often pre-process the data. This is a point that is
                  often forgotten.


                  2.2  Data collection and management
                  Many statistical textbooks and courses on statistics begin with a clean data set.
                  Unfortunately in the real word researchers are often faced with something that
                  is very different to a clean dataset. They are presented with data sets that may
                  have missing values for some patients, there may be values recorded which are not
                  feasible, dates may be captured in varying forms (day/month/year) (month/day/
                  year) and variables might be captured as text fields. Below are two tables from
                  spreadsheets (both fictitious). One would require considerable modification prior
                  to data analysis (Table 1) while the other would not (Table 2). An example of the
                  modification that would be needed would be to convert all weights captured so that
                  they are in the same units—not alternating between kg and stones and pounds. If
                  weights of differing units were to be read as a single variable then a summariz-
                  ing such data would be meaningless. In the dirty spreadsheet Ethnicity has been
                  captured as free text. A variety of entries have been made for this variable but if
                  we consider the category White there are three terms (White, W and w) that have
                  been used within this column to indicate that the subject was white. Prior to data
                  analysis these need to be converted into the same term so that when the categories
                  are summed, the correct totals are provided rather than having to tally several sub-
                  totals. The example (Tables 1 and 2) illustrate a very small data set, but consider
                  this amplified by several tens, hundreds or thousands. While code can be written
                  to facilitate the data conversion, writing such code can be time consuming and
                  may introduce error. This can be avoided by carefully considering how to capture
                  data correctly in the first place. Time spent planning data capture—avoiding free
                  text, use of standard coding systems where possible (such as the ICD coding for
                  capturing disease) mean that data analysis can be conducted efficiently and results
                  delivered in a timely fashion.
   175   176   177   178   179   180   181   182   183   184   185