Page 24 -
P. 24
10 Having a dev set and metric speeds up
iterations
It is very difficult to know in advance what approach will work best for a new problem. Even
experienced machine learning researchers will usually try out many dozens of ideas before
they discover something satisfactory. When building a machine learning system, I will often:
1. Start off with some idea on how to build the system.
2. Implement the idea in code.
3. Carry out an experiment which tells me how well the idea worked. (Usually my first few
ideas don’t work!) Based on these learnings, go back to generate more ideas, and keep on
iterating.
This is an iterative process. The faster you can go round this loop, the faster you will make
progress. This is why having dev/test sets and a metric are important: Each time you try an
idea, measuring your idea’s performance on the dev set lets you quickly decide if you’re
heading in the right direction.
In contrast, suppose you don’t have a specific dev set and metric. So each time your team
develops a new cat classifier, you have to incorporate it into your app, and play with the app
for a few hours to get a sense of whether the new classifier is an improvement. This would be
incredibly slow! Also, if your team improves the classifier’s accuracy from 95.0% to 95.1%,
you might not be able to detect that 0.1% improvement from playing with the app. Yet a lot
of progress in your system will be made by gradually accumulating dozens of these 0.1%
improvements. Having a dev set and metric allows you to very quickly detect which ideas are
successfully giving you small (or large) improvements, and therefore lets you quickly decide
what ideas to keep refining, and which ones to discard.
Page 24 Machine Learning Yearning-Draft Andrew Ng