Page 199 - Artificial Intelligence for Computational Modeling of the Heart
P. 199

Chapter 5 Machine learning methods for robust parameter estimation  171




                        where all convergence criteria are met. The overall goal is to
                        learn how to reach that state.
                     • Actions modify the parameters x to fulfill the objectives c.An
                        action a ∈ A consists in either in- or decrementing one pa-
                        rameter x i by 1×, 10× or 100× a user-specified reference value
                        δ i ∈ δ. This quantization of the intrinsically continuous action
                        space can be defined empirically.
                     • Transition function encodes learnt knowledge about the com-
                        putational model f , see following paragraphs for details.

                     • Rewards are defined as R(s,a,s ) =−1 (punishment), except
                        when an action resulting in personalization success was per-
                        formed, then R(·,·, ˆs) = 0.
                     • Discount factor is designed to encourage finding a policy fa-
                        voring future over immediate rewards to avoid local minima,
                        therefore a large γ = 0.99 is used.

                     5.3.1.2 Learning model behavior through exploration
                        To learn the behavior of the model f when its parameter val-
                     ues are changed, we perform model exploration episodes. Each
                     episode is initiated with random, but physiologically plausible
                     model parameters x t ∈ Ω and one of the available training pa-
                     tients. From the outputs of a forward model run y = f(x t ),the
                                                                    t
                     misfits to the patient’s corresponding measurements z are com-
                     puted, yielding the objectives vector c t = c(y ,z).Wethenem-
                                                               t
                     ploy a random exploration policy π rand , i.e., a policy that selects
                     random actions according to a discrete uniform probability dis-
                     tribution. The selected action is applied to the current parame-
                     ter vector, yielding x t+1 = a t (x t ). The forward model is computed,
                     y   = f(x t+1 ), yielding c t+1 .The next action a t+1 is selected ac-
                      t+1
                     cording to π rand , and so on. This process is repeated n e-steps = 100
                     times in our experiments to complete one episode, which can be
                     written as:

                        {(x t ,y ,c t ,a t , x t+1 ,y  ,c t+1 ), t = 0,...,n e-steps − 1} .  (5.2)
                             t             t+1
                     Many of such episodes are usually generated per available training
                     patient and later combined into one multi-patient set of episodes,
                     denoted E. It is important to have multiple training patients to
                     allow abstraction from patient-specific to model-specific knowl-
                     edge.
                        Please note this is done once, at training, and is not repeated
                     for each personalization procedure. Exploration can therefore be
                     as extensive as needed.
   194   195   196   197   198   199   200   201   202   203   204