Page 178 - Computational Retinal Image Analysis
P. 178
1 Introduction 173
build this on principles of combinatorics and probability and by doing so they
bring quantification of knowledge and rationalizing the learning and decision-
making process.
There are two main tenets of the statistics [1]:
• Statistical models are used to express knowledge and uncertainty about a signal
in the presence of noise, via inductive reasoning.
• Statistical methods may be analyzed to determine how well they are likely
to perform. These methods include: exploratory data analysis, statistical
inference, predictive or prognostic modeling, and methods of discrimination or
classification. The performance of the methods depends on data collection and
management and study designs.
Here we elaborate on these principles and mention ways in which statistics has
contributed in ophthalmology. The first tenet says that the statistical models serve to
describe regularity and variations of data in terms of probability distributions. When
data are collected, repeatedly, under conditions that are as nearly identical as an in-
vestigator can make them, the measured responses exhibit variation. For example, the
corneal curvature measured via the maximum keratometric reading (Kmax) exhibits
variation between measurements done on the same day with a short break between
the measurements [2]. Therefore, the most fundamental principle of the statistics and
its starting point is to describe this variability via probability distributions. What do
we mean? Probability distributions will allow to describe the curvature to be equal to
the mean curvature, hence the regular component, and to the random variation due
random change in light conditions or random change in the direction of the camera,
hence the noise component. For example, when the underlying mathematical expres-
sion of the signal is y = f(x), which is then replaced by a statistical model having the
form, Y = f(x) + ε, where the capital letter Y denotes a random variable that we aim
to measure, the letter ɛ is the noise and hence the expression above becomes signal
plus noise. The simplest form is when f(x) is a constant, such as mean μ, or a linear
regression line, such as β 0 + β 1 x, where the coefficients μ, β 0 , β 1 are unknown param-
eters, denoted by Greek letter to follow statistical convention. Then the parameters
are estimated together with some measure of uncertainty about them (e.g. P-value or
confidence interval) and this is then used to infer about the values of the parameters,
μ, β 0 , β 1 .
The second tenet says that each statistical method is a tool whose properties can
be analyzed. This is important statement. It means that each statistical method (such
as two-sample independent t-test) is a method and as such it works well only in
certain scenarios. The two-sample t-test works well if assumptions on which the test
was built are satisfied. Those assumptions are normality of the measured data and the
design is that two independent random samples were obtained (such as two unrelated
groups of patients). However, statistical theory can be used to study how well would
a two-sample t-test work if some of the assumptions are not satisfied, i.e. if is it ro-
bust to some violations [3].