Page 27 -
P. 27
12 Takeaways: Setting up development and
test sets
• Choose dev and test sets from a distribution that reflects what data you expect to get in
the future and want to do well on. This may not be the same as your training data’s
distribution.
• Choose dev and test sets from the same distribution if possible.
• Choose a single-number evaluation metric for your team to optimize. If there are multiple
goals that you care about, consider combining them into a single formula (such as
averaging multiple error metrics) or defining satisficing and optimizing metrics.
• Machine learning is a highly iterative process: You may try many dozens of ideas before
finding one that you’re satisfied with.
• Having dev/test sets and a single-number evaluation metric helps you quickly evaluate
algorithms, and therefore iterate faster.
• When starting out on a brand new application, try to establish dev/test sets and a metric
quickly, say in less than a week. It might be okay to take longer on mature applications.
• The old heuristic of a 70%/30% train/test split does not apply for problems where you
have a lot of data; the dev and test sets can be much less than 30% of the data.
• Your dev set should be large enough to detect meaningful changes in the accuracy of your
algorithm, but not necessarily much larger. Your test set should be big enough to give you
a confident estimate of the final performance of your system.
• If your dev set and metric are no longer pointing your team in the right direction, quickly
change them: (i) If you had overfit the dev set, get more dev set data. (ii) If the actual
distribution you care about is different from the dev/test set distribution, get new
dev/test set data. (iii) If your metric is no longer measuring what is most important to
you, change the metric.
Page 27 Machine Learning Yearning-Draft Andrew Ng