Page 79 -
P. 79
41 Identifying Bias, Variance, and Data
Mismatch Errors
Suppose humans achieve almost perfect performance (≈0% error) on the cat detection task,
and thus the optimal error rate is about 0%. Suppose you have:
• 1% error on the training set.
• 5% error on training dev set.
• 5% error on the dev set.
What does this tell you? Here, you know that you have high variance. The variance reduction
techniques described earlier should allow you to make progress.
Now, suppose your algorithm achieves:
• 10% error on the training set.
• 11% error on training dev set.
• 12% error on the dev set.
This tells you that you have high avoidable bias on the training set. I.e., the algorithm is
doing poorly on the training set. Bias reduction techniques should help.
In the two examples above, the algorithm suffered from only high avoidable bias or high
variance. It is possible for an algorithm to suffer from any subset of high avoidable bias, high
variance, and data mismatch. For example:
• 10% error on the training set.
• 11% error on training dev set.
• 20% error on the dev set.
This algorithm suffers from high avoidable bias and from data mismatch. It does not,
however, suffer from high variance on the training set distribution.
It might be easier to understand how the different types of errors relate to each other by
drawing them as entries in a table:
Page 79 Machine Learning Yearning-Draft Andrew Ng