Page 39 -
P. 39
How about the Blackbox dev set? We previously said that dev sets of around 1,000-10,000
examples are common. To refine that statement, a Blackbox dev set of 1,000-10,000
examples will often give you enough data to tune hyperparameters and select among models,
though there is little harm in having even more data. A Blackbox dev set of 100 would be
small but still useful.
If you have a small dev set, then you might not have enough data to split into Eyeball and
Blackbox dev sets that are both large enough to serve their purposes. Instead, your entire dev
set might have to be used as the Eyeball dev set—i.e., you would manually examine all the
dev set data.
Between the Eyeball and Blackbox dev sets, I consider the Eyeball dev set more important
(assuming that you are working on a problem that humans can solve well and that examining
the examples helps you gain insight). If you only have an Eyeball dev set, you can perform
error analyses, model selection and hyperparameter tuning all on that set. The downside of
having only an Eyeball dev set is that the risk of overfitting the dev set is greater.
If you have plentiful access to data, then the size of the Eyeball dev set would be determined
mainly by how many examples you have time to manually analyze. For example, I’ve rarely
seen anyone manually analyze more than 1,000 errors.
Page 39 Machine Learning Yearning-Draft Andrew Ng