Page 199 - Artificial Intelligence for Computational Modeling of the Heart
P. 199
Chapter 5 Machine learning methods for robust parameter estimation 171
where all convergence criteria are met. The overall goal is to
learn how to reach that state.
• Actions modify the parameters x to fulfill the objectives c.An
action a ∈ A consists in either in- or decrementing one pa-
rameter x i by 1×, 10× or 100× a user-specified reference value
δ i ∈ δ. This quantization of the intrinsically continuous action
space can be defined empirically.
• Transition function encodes learnt knowledge about the com-
putational model f , see following paragraphs for details.
• Rewards are defined as R(s,a,s ) =−1 (punishment), except
when an action resulting in personalization success was per-
formed, then R(·,·, ˆs) = 0.
• Discount factor is designed to encourage finding a policy fa-
voring future over immediate rewards to avoid local minima,
therefore a large γ = 0.99 is used.
5.3.1.2 Learning model behavior through exploration
To learn the behavior of the model f when its parameter val-
ues are changed, we perform model exploration episodes. Each
episode is initiated with random, but physiologically plausible
model parameters x t ∈ Ω and one of the available training pa-
tients. From the outputs of a forward model run y = f(x t ),the
t
misfits to the patient’s corresponding measurements z are com-
puted, yielding the objectives vector c t = c(y ,z).Wethenem-
t
ploy a random exploration policy π rand , i.e., a policy that selects
random actions according to a discrete uniform probability dis-
tribution. The selected action is applied to the current parame-
ter vector, yielding x t+1 = a t (x t ). The forward model is computed,
y = f(x t+1 ), yielding c t+1 .The next action a t+1 is selected ac-
t+1
cording to π rand , and so on. This process is repeated n e-steps = 100
times in our experiments to complete one episode, which can be
written as:
{(x t ,y ,c t ,a t , x t+1 ,y ,c t+1 ), t = 0,...,n e-steps − 1} . (5.2)
t t+1
Many of such episodes are usually generated per available training
patient and later combined into one multi-patient set of episodes,
denoted E. It is important to have multiple training patients to
allow abstraction from patient-specific to model-specific knowl-
edge.
Please note this is done once, at training, and is not repeated
for each personalization procedure. Exploration can therefore be
as extensive as needed.