Page 88 -
P. 88

46 Reinforcement learning example









































             Suppose you are using machine learning to teach a helicopter to fly complex maneuvers.
             Here is a time-lapse photo of a computer-controller helicopter executing a landing with the
             engine turned off.


             This is called an “autorotation” maneuver. It allows helicopters to land even if their engine
             unexpectedly fails. Human pilots practice this maneuver as part of their training. Your goal
             is to use a learning algorithm to fly the helicopter through a trajectory ​T ​that ends in a safe
             landing.


             To apply reinforcement learning, you have to develop a “Reward function” ​R​(.) that gives a
             score measuring how good each possible trajectory ​T​ is. For example, if ​T ​results in the
             helicopter crashing, then perhaps the reward is ​R(T)​ = -1,000—a huge negative reward. A
             trajectory ​T​ resulting in a safe landing might result in a positive ​R(T) ​with the exact value
             depending on how smooth the landing was. The reward function ​R​(.) is typically chosen by
             hand to quantify how desirable different trajectories ​T​ are. It has to trade off how bumpy the
             landing was, whether the helicopter landed in exactly the desired spot, how rough the ride

             down was for passengers, and so on. It is not easy to design good reward functions.




             Page 88                            Machine Learning Yearning-Draft                       Andrew Ng
   83   84   85   86   87   88   89   90   91   92   93