Page 198 - Computational Retinal Image Analysis
P. 198

8   Ophthalmic imaging data challenges   193




                     Both statistics and ML are scientific disciplines to extract knowledge from data.
                  They are evolving disciplines so their remit is changing over time [46]. We would
                  define these disciplines as follows: Statistics is a scientific discipline that provides
                  framework to learn from data, via statistical algorithms and statistical inference. The
                  statistical algorithms, in a broad sense, do parameter estimation, they include the
                  exploratory analyses (e.g. averages, histograms), predictive and prognostic modeling
                  (predict future observation, discrimination methods, classification). The statistical
                  inference is focused upon the accuracy of the algorithms and inferring about the
                  population (explanatory modeling). It includes P-values, confidence intervals, and
                  uncertainty measures. Machine learning is a scientific discipline that focuses pri-
                  marily on the predictive and prognostic algorithms (deep neural network, Support
                  Vector Machine, etc.) and brings strength in improving the computational power and
                  speed in highly dimensional data such as images. Data science is a scientific disci-
                  pline that mainly focuses on the exploratory and explanatory (i.e. inference) goals
                  [46]. It investigates the data generating mechanisms and it generates new hypotheses.
                     Statisticians and computer scientists often use different language for the same
                  thing, which may create the illusion that they are speaking about different phenom-
                  ena. This may lead to confusion among students and users of computer science,
                  statistics as well as among clinicians. Here we created a table of terms with our ex-
                  planation of the meaning (see Table 4), which extends on the table in Ref. [3]. There
                  are many similarities between statistics and machine learning—as they both aim to
                  learn from the data and many of their methods are based on probability. There are
                  also differences (see Table 4) in the methods that they use.
                     Is it worth discussing ML techniques for retinal data analysis, from the point of
                  view of statisticians? Yes, it is. The statistics can provide the framework for methods
                  of measurement, validation of diagnostic tests and provides the framework for cal-
                  culation of the uncertainty estimation and reporting in ML algorithms (e.g. Monte
                  Carlo dropout method in deep learning algorithms). Furthermore, a collaboration of
                  ML and statistic researchers is needed in developing the validation and interpretation
                  of ML techniques as well as in provision of reproducible research.
                     What is the difference between ML and statistics especially in relation to retinal
                  imaging? The ML approach is predominantly pixel-wise oriented. The statistical ap-
                  proaches to retinal imaging require maximization of a likelihood, which is tractable
                  only after a suitable data reduction such as downsampling or division of the image in
                  a smaller number of sectors. The statistical methods for ophthalmic images are less
                  common and are now an intensive area of research e.g. Refs. [9, 10, 47].
                     Statistical science and probability can provide the theoretical framework for the
                  complex data analysis and machine learning algorithms. Efron and Hastie say, “It
                  is the job of statistical inference to connect dangling algorithms to the central core
                  of well-understood methodology. The connection process is already underway.” As
                  an example they illustrate how Adaboost, the original machine learning algorithm,
                  could be restated as a close cousin of logistic regression. They envisage an optimistic
                  scenario of “the big-data/data-science prediction world rejoining the mainstream of
                  statistical inference, to the benefit of both branches” [46].
   193   194   195   196   197   198   199   200   201   202   203