Page 134 - Artificial Intelligence for Computational Modeling of the Heart
P. 134

106  Chapter 3 Learning cardiac anatomy




                                         challenge, we propose to reformulate the anatomical object de-
                                         tection problem as a search task for an artificial agent. Using ele-
                                         ments of reinforcement learning and deep learning [261], we de-
                                         sign an algorithm to teach artificial agents optimal navigation tra-
                                         jectories through the image space towards the anatomical struc-
                                         tures of interest [262–265].

                                         3.2.2.1 Learning to search for anatomical objects
                                            In contrast to exhaustive search, an artificial agent learns how
                                         to find a structure or landmark by navigating through the space of
                                                          3
                                         a given image I : Z → R. A Markov Decision Process (MDP) [266]
                                         models the dynamics of the navigation: M := (S,A,T ,F,γ ).The
                                         components are as follows: S is a finite set of states, s t ∈ S being
                                         the state of the agent at time t (we define this based on the image
                                         context around the position of the agent at time t); A is a finite
                                         set of actions for voxel-wise navigation, i.e., ±1 voxels along each
                                         image axis; T : S × A × S →[0,1] is the stochastic transition func-
                                         tion; F : S × A × S → R is the reward (feedback) function used as
                                                                       s 	           2             2
                                         incentive for behavior learning (F  = p t −p GT   − p t+1 −p GT
                                                                       s,a           2             2
                                         defines the expected distance-based reward for transitioning from
                                         state s to state s , i.e., from point p t to p t+1 while searching for

                                         the location of the target structure); and γ ∈ (0,1) is the discount-
                                         factor, balancing immediate and future rewards [265].
                                            Based on these components, we define the optimal action-
                                                         ∗
                                         value function Q : S × A → R that measures the maximum ex-
                                         pected discounted future reward R t of an optimal navigation pol-
                                         icy π : Q (s,a) = max π E[R t |s t = s,a t = a,π].One canderiveare-
                                                 ∗
                                              ∗
                                         cursiveformofthisfunction, alsoreferredtoasthe Bellman op-

                                         timality criterion [266]: Q (s,a) = E s r + γ max a Q (s ,a ) (r is
                                                                 ∗
                                                                                        ∗




                                         the reward from s to s ). We propose to use a parametric model

                                         in the form of a deep neural network to approximate this func-
                                         tion: Q (s,a) ≈ Q(s,a;θ) (θ defines the parameters). Based on
                                                ∗
                                         Q-learning [267,268], one can learn an effective image naviga-
                                         tion strategy for finding anatomical structures with maximum re-
                                         ward [261,265]. More details are provided in [265].
                                         3.2.2.2 Extending to multi-scale search
                                            To ensure the scalability of this strategy to high-resolution (in-
                                         complete) volumetric scans, we propose to model the search as
                                         a multi-scale navigation process over a discrete scale-space of
                                         a given image I. In this context, we redefine the states and ac-
                                         tions of the Markov Decision Process M as follows: s t encodes
                                         the local image context around the current agent position as an
                                         axis-aligned box of image intensities. For a given scale level m,
                                         0 ≤ m<M, the image is represented as an instance L d (m) of an
   129   130   131   132   133   134   135   136   137   138   139