Page 78 -

P. 78

• Training set. This is the data that the algorithm will learn from (e.g., Internet images +
Mobile images). This does not have to be drawn from the same distribution as what we
really care about (the dev/test set distribution).

• Training dev set: This data is drawn from the same distribution as the training set (e.g.,
Internet images + Mobile images). This is usually smaller than the training set; it only
needs to be large enough to evaluate and track the progress of our learning algorithm.

• Dev set: This is drawn from the same distribution as the test set, and it reflects the
distribution of data that we ultimately care about doing well on. (E.g., mobile images.)

• Test set: This is drawn from the same distribution as the dev set. (E.g., mobile images.)

Armed with these four separate datasets, you can now evaluate:

• Training error, by evaluating on the training set.

• The algorithm’s ability to generalize to new data drawn from the training set distribution,
by evaluating on the training dev set.

• The algorithm’s performance on the task you care about, by evaluating on the dev and/or
test sets.

Most of the guidelines in Chapters 5-7 for picking the size of the dev set also apply to the
training dev set.

Page 78 Machine Learning Yearning-Draft Andrew Ng

73 74 75 76 77 78 79 80 81 82 83