Page 199 - Artificial Intelligence for Computational Modeling of the Heart

P. 199

Chapter 5 Machine learning methods for robust parameter estimation 171

where all convergence criteria are met. The overall goal is to
learn how to reach that state.
• Actions modify the parameters x to fulﬁll the objectives c.An
action a ∈ A consists in either in- or decrementing one pa-
rameter x i by 1×, 10× or 100× a user-speciﬁed reference value
δ i ∈ δ. This quantization of the intrinsically continuous action
space can be deﬁned empirically.
• Transition function encodes learnt knowledge about the com-
putational model f , see following paragraphs for details.

• Rewards are deﬁned as R(s,a,s ) =−1 (punishment), except
when an action resulting in personalization success was per-
formed, then R(·,·, ˆs) = 0.
• Discount factor is designed to encourage ﬁnding a policy fa-
voring future over immediate rewards to avoid local minima,
therefore a large γ = 0.99 is used.

5.3.1.2 Learning model behavior through exploration
To learn the behavior of the model f when its parameter val-
ues are changed, we perform model exploration episodes. Each
episode is initiated with random, but physiologically plausible
model parameters x t ∈ Ω and one of the available training pa-
tients. From the outputs of a forward model run y = f(x t ),the
t
misﬁts to the patient’s corresponding measurements z are com-
puted, yielding the objectives vector c t = c(y ,z).Wethenem-
t
ploy a random exploration policy π rand , i.e., a policy that selects
random actions according to a discrete uniform probability dis-
tribution. The selected action is applied to the current parame-
ter vector, yielding x t+1 = a t (x t ). The forward model is computed,
y = f(x t+1 ), yielding c t+1 .The next action a t+1 is selected ac-
t+1
cording to π rand , and so on. This process is repeated n e-steps = 100
times in our experiments to complete one episode, which can be
written as:

{(x t ,y ,c t ,a t , x t+1 ,y ,c t+1 ), t = 0,...,n e-steps − 1} . (5.2)
t t+1
Many of such episodes are usually generated per available training
patient and later combined into one multi-patient set of episodes,
denoted E. It is important to have multiple training patients to
allow abstraction from patient-speciﬁc to model-speciﬁc knowl-
edge.
Please note this is done once, at training, and is not repeated
for each personalization procedure. Exploration can therefore be
as extensive as needed.

194 195 196 197 198 199 200 201 202 203 204