Page 88 -

P. 88

46 Reinforcement learning example

Suppose you are using machine learning to teach a helicopter to fly complex maneuvers.
Here is a time-lapse photo of a computer-controller helicopter executing a landing with the
engine turned off.

This is called an “autorotation” maneuver. It allows helicopters to land even if their engine
unexpectedly fails. Human pilots practice this maneuver as part of their training. Your goal
is to use a learning algorithm to fly the helicopter through a trajectory T that ends in a safe
landing.

To apply reinforcement learning, you have to develop a “Reward function” R(.) that gives a
score measuring how good each possible trajectory T is. For example, if T results in the
helicopter crashing, then perhaps the reward is R(T) = -1,000—a huge negative reward. A
trajectory T resulting in a safe landing might result in a positive R(T) with the exact value
depending on how smooth the landing was. The reward function R(.) is typically chosen by
hand to quantify how desirable different trajectories T are. It has to trade off how bumpy the
landing was, whether the helicopter landed in exactly the desired spot, how rough the ride

down was for passengers, and so on. It is not easy to design good reward functions.

Page 88 Machine Learning Yearning-Draft Andrew Ng

83 84 85 86 87 88 89 90 91 92 93