Page 200 - Artificial Intelligence for Computational Modeling of the Heart
P. 200

172  Chapter 5 Machine learning methods for robust parameter estimation




                                         5.3.1.3 From computed objectives to representative MDP state
                                            The continuous space of objective vectors now needs to be
                                         quantized into a finite set of representative MDP states S using a
                                         data-driven approach. To this end, all objective vectors c that were
                                         observed during exploration (as part of episodes in E)are grouped
                                         into n S − 1 clusters based on their distance to each other. The dis-
                                         tance metric is defined relative to the inverse of the thresholds in
                                         the convergence criteria to ensure similar influence of all objec-
                                         tives (e.g., to cancel out different units, etc.):



                                                  	c 1 − c 2 	 ψ =  (c 1 − c 2 ) diag(ψ) −1  (c 1 − c 2 ) .  (5.3)
                                         The centroid of each cluster becomes the centroid of a represen-
                                         tative state, and the special “success state” mentioned earlier, de-
                                               s
                                         noted ˆ, is artificially created to cover the region in state space
                                         where all objectives are met: ∀i :|c i | <ψ i . This results in a total
                                         of n S states: n S − 1 are data-driven, and one is the success state.
                                            To determine the MDP state of a given objective vector c,we
                                         introduce a mapping φ.Let ξ denote the centroid corresponding
                                                                   s
                                         to state s, then the mapping is defined as:

                                                           φ(c) = argmin	c − ξ 	 ψ .            (5.4)
                                                                              s
                                                                   s∈S
                                         5.3.1.4 Transition function as probabilistic model representation
                                            In this work, the stochastic MDP transition function T is gen-
                                         erated such that the transition probabilities encode the learnt
                                         knowledge about the behavior of computational model f .To
                                         this end, we rely on model exploration and resulting training
                                         episodes E. First, the individual samples (x t ,y ,c t ,a t ,x t+1 ,y  ,
                                                                                    t            t+1
                                                                                                 ˆ
                                         c t+1 ) are converted to state-action-state transition tuples E =


                                         {(s,a,s )},where s = φ(c t ), a = a t and s = φ(c t+1 ). The transition
                                         function is then approximated at each point based on statistical
                                         analysis of the observed transition samples:
                                                                               ˆ
                                                                     |{(s,a,s ) ∈ E}|

                                                                                      .         (5.5)
                                                       T (s,a,s ) =
                                                                                   ˆ

                                                                        |{(s,a,s ) ∈ E}|
                                                                    s ∈S
                                         Some state-action combinations may not be observed, especially
                                         if n S and n A are large. In such cases, uniform probability is as-
                                         signed.
                                            Now that the MDP model M is fully defined, we apply value
                                                                                                   ∗
                                         iteration (section 5.3.1) and compute the stochastic policy ˜π ,

                                         which completes the off-line phase.
   195   196   197   198   199   200   201   202   203   204   205