Page 25 -
P. 25

11 When to change dev/test sets and metrics




             When starting out on a new project, I try to quickly choose dev/test sets, since this gives the
             team a well-defined target to aim for.

             I typically ask my teams to come up with an initial dev/test set and an initial metric in less
             than one week—rarely longer. It is better to come up with something imperfect and get going

             quickly, rather than overthink this. But this one week timeline does not apply to mature
             applications. For example, anti-spam is a mature deep learning application. I have seen
             teams working on already-mature systems spend months to acquire even better dev/test
             sets.

             If you later realize that your initial dev/test set or metric missed the mark, by all means
             change them quickly. For example, if your dev set + metric ranks classifier A above classifier

             B, but your team thinks that classifier B is actually superior for your product, then this might
             be a sign that you need to change your dev/test sets or your evaluation metric.

             There are three main possible causes of the dev set/metric incorrectly rating classifier A
             higher:


             1. The actual distribution you need to do well on is different from the dev/test sets.

             Suppose your initial dev/test set had mainly pictures of adult cats. You ship your cat app,
             and find that users are uploading a lot more kitten images than expected. So, the dev/test set
             distribution is not representative of the actual distribution you need to do well on. In this
             case, update your dev/test sets to be more representative.





























             Page 25                            Machine Learning Yearning-Draft                       Andrew Ng
   20   21   22   23   24   25   26   27   28   29   30