Page 199 - Computational Retinal Image Analysis
P. 199

194    CHAPTER 10  Statistics in ophthalmology




                          Table 4  The summary of the most frequently used terms and some
                          differences between disciplines
                                     Statistics and data
                                     science              Machine learning  Explanation
                          Terminology   Data              Training sample  Values of X and Y
                          used       Estimation, model fitting  Learning   Using data to
                                                                           estimate an unknown
                                                                           quantity
                                     Model                Network, graphs  Multivariate
                                                                           distribution with
                                                                           assumed relations
                                     Covariates and       Features and     The X i ’s and Beta’s
                                     parameters           weights
                                     Hypothesis and inference  – (ML is not   An inductive process
                                                          focusing on      to learn about a
                                                          hypothesis testing)  parameter
                                     Classification,      Supervised learning  Predicting the value
                                     discrimination                        of Y of a single
                                                                           patient (or eye) from
                                                                           X, groups are known
                                                                           apriory
                                     Cluster analysis, density   Unsupervised   Putting data into
                                     estimation           learning         groups that are not
                                                                           known apriory
                                     Generalization or test set   Generalization   Evaluating if the
                                     performance          or test set      results can be
                                                          performance      generalized to whole
                                                                           population
                                     Linear and nonlinear   Probabilistic   Model is fit to data
                                     models for prognosis or   generative models  and then it is used
                                     classification                        to derive a posterior
                                                                           probability for Y
                          Differences  Large grant = £200,000  Large       There is a difference
                                                          grant = £1,000,000  in what is considered
                                                                           a large grant.
                                     Publishing new statistical   Publishing new   There is a different
                                     methods in journals,   methods in     culture of publishing.
                                     taking 3 years to publish  proceedings, taking
                                                          <1 year to publish
                                     Objectives are mainly   Objectives are   There is a difference
                                     in the study design,   mainly in the   in objectives of the
                                     computation, inference   prediction and large   two disciplines.
                                     and prediction       scale computation
                                     Approaches not used   Development of   There are some
                                     in ML: e.g. regression   new methods not   methods not shared
                                     diagnostics, significance   related to statistics:   across the two
                                     testing.             e.g. Max-margin   disciplines.
                                                          methods, support
                                                          vector machines.
   194   195   196   197   198   199   200   201   202   203   204