Page 198 - Computational Retinal Image Analysis
P. 198
8 Ophthalmic imaging data challenges 193
Both statistics and ML are scientific disciplines to extract knowledge from data.
They are evolving disciplines so their remit is changing over time [46]. We would
define these disciplines as follows: Statistics is a scientific discipline that provides
framework to learn from data, via statistical algorithms and statistical inference. The
statistical algorithms, in a broad sense, do parameter estimation, they include the
exploratory analyses (e.g. averages, histograms), predictive and prognostic modeling
(predict future observation, discrimination methods, classification). The statistical
inference is focused upon the accuracy of the algorithms and inferring about the
population (explanatory modeling). It includes P-values, confidence intervals, and
uncertainty measures. Machine learning is a scientific discipline that focuses pri-
marily on the predictive and prognostic algorithms (deep neural network, Support
Vector Machine, etc.) and brings strength in improving the computational power and
speed in highly dimensional data such as images. Data science is a scientific disci-
pline that mainly focuses on the exploratory and explanatory (i.e. inference) goals
[46]. It investigates the data generating mechanisms and it generates new hypotheses.
Statisticians and computer scientists often use different language for the same
thing, which may create the illusion that they are speaking about different phenom-
ena. This may lead to confusion among students and users of computer science,
statistics as well as among clinicians. Here we created a table of terms with our ex-
planation of the meaning (see Table 4), which extends on the table in Ref. [3]. There
are many similarities between statistics and machine learning—as they both aim to
learn from the data and many of their methods are based on probability. There are
also differences (see Table 4) in the methods that they use.
Is it worth discussing ML techniques for retinal data analysis, from the point of
view of statisticians? Yes, it is. The statistics can provide the framework for methods
of measurement, validation of diagnostic tests and provides the framework for cal-
culation of the uncertainty estimation and reporting in ML algorithms (e.g. Monte
Carlo dropout method in deep learning algorithms). Furthermore, a collaboration of
ML and statistic researchers is needed in developing the validation and interpretation
of ML techniques as well as in provision of reproducible research.
What is the difference between ML and statistics especially in relation to retinal
imaging? The ML approach is predominantly pixel-wise oriented. The statistical ap-
proaches to retinal imaging require maximization of a likelihood, which is tractable
only after a suitable data reduction such as downsampling or division of the image in
a smaller number of sectors. The statistical methods for ophthalmic images are less
common and are now an intensive area of research e.g. Refs. [9, 10, 47].
Statistical science and probability can provide the theoretical framework for the
complex data analysis and machine learning algorithms. Efron and Hastie say, “It
is the job of statistical inference to connect dangling algorithms to the central core
of well-understood methodology. The connection process is already underway.” As
an example they illustrate how Adaboost, the original machine learning algorithm,
could be restated as a close cousin of logistic regression. They envisage an optimistic
scenario of “the big-data/data-science prediction world rejoining the mainstream of
statistical inference, to the benefit of both branches” [46].