Page 134 - Artificial Intelligence for Computational Modeling of the Heart

P. 134

106 Chapter 3 Learning cardiac anatomy

challenge, we propose to reformulate the anatomical object de-
tection problem as a search task for an artiﬁcial agent. Using ele-
ments of reinforcement learning and deep learning [261], we de-
sign an algorithm to teach artiﬁcial agents optimal navigation tra-
jectories through the image space towards the anatomical struc-
tures of interest [262–265].

3.2.2.1 Learning to search for anatomical objects
In contrast to exhaustive search, an artiﬁcial agent learns how
to ﬁnd a structure or landmark by navigating through the space of
3
a given image I : Z → R. A Markov Decision Process (MDP) [266]
models the dynamics of the navigation: M := (S,A,T ,F,γ ).The
components are as follows: S is a ﬁnite set of states, s t ∈ S being
the state of the agent at time t (we deﬁne this based on the image
context around the position of the agent at time t); A is a ﬁnite
set of actions for voxel-wise navigation, i.e., ±1 voxels along each
image axis; T : S × A × S →[0,1] is the stochastic transition func-
tion; F : S × A × S → R is the reward (feedback) function used as
s 2 2
incentive for behavior learning (F = p t −p GT − p t+1 −p GT
s,a 2 2
deﬁnes the expected distance-based reward for transitioning from
state s to state s , i.e., from point p t to p t+1 while searching for

the location of the target structure); and γ ∈ (0,1) is the discount-
factor, balancing immediate and future rewards [265].
Based on these components, we deﬁne the optimal action-
∗
value function Q : S × A → R that measures the maximum ex-
pected discounted future reward R t of an optimal navigation pol-
icy π : Q (s,a) = max π E[R t |s t = s,a t = a,π].One canderiveare-
∗
∗
cursiveformofthisfunction, alsoreferredtoasthe Bellman op-

timality criterion [266]: Q (s,a) = E s r + γ max a Q (s ,a ) (r is
∗
∗

the reward from s to s ). We propose to use a parametric model

in the form of a deep neural network to approximate this func-
tion: Q (s,a) ≈ Q(s,a;θ) (θ deﬁnes the parameters). Based on
∗
Q-learning [267,268], one can learn an effective image naviga-
tion strategy for ﬁnding anatomical structures with maximum re-
ward [261,265]. More details are provided in [265].
3.2.2.2 Extending to multi-scale search
To ensure the scalability of this strategy to high-resolution (in-
complete) volumetric scans, we propose to model the search as
a multi-scale navigation process over a discrete scale-space of
a given image I. In this context, we redeﬁne the states and ac-
tions of the Markov Decision Process M as follows: s t encodes
the local image context around the current agent position as an
axis-aligned box of image intensities. For a given scale level m,
0 ≤ m<M, the image is represented as an instance L d (m) of an

129 130 131 132 133 134 135 136 137 138 139