Page 18 -
P. 18

3. The test set is not necessarily harder, but just different, from the dev set. So what works
                 well on the dev set just does not work well on the test set. In this case, a lot of your work
                 to improve dev set performance might be wasted effort.

             Working on machine learning applications is hard enough. Having mismatched dev and test
             sets introduces additional uncertainty about whether improving on the dev set distribution
             also improves test set performance. Having mismatched dev and test sets makes it harder to

             figure out what is and isn’t working, and thus makes it harder to prioritize what to work on.

             If you are working on a 3rd party benchmark problem, their creator might have specified dev
             and test sets that come from different distributions. Luck, rather than skill, will have a
             greater impact on your performance on such benchmarks compared to if the dev and test
             sets come from the same distribution. It is an important research problem to develop

             learning algorithms that are trained on one distribution and generalize well to another. But if
             your goal is to make progress on a specific machine learning application rather than make
             research progress, I  recommend trying to choose dev and test sets that are drawn from the
             same distribution. This will make your team more efficient.













































             Page 18                            Machine Learning Yearning-Draft                       Andrew Ng
   13   14   15   16   17   18   19   20   21   22   23