Page 46 -

P. 46

22 Comparing to the optimal error rate

In our cat recognition example, the “ideal” error rate—that is, one achievable by an “optimal”
classifier—is nearly 0%. A human looking at a picture would be able to recognize if it
contains a cat almost all the time; thus, we can hope for a machine that would do just as well.

Other problems are harder. For example, suppose that you are building a speech recognition

system, and find that 14% of the audio clips have so much background noise or are so
unintelligible that even a human cannot recognize what was said. In this case, even the most
“optimal” speech recognition system might have error around 14%.

Suppose that on this speech recognition problem, your algorithm achieves:

• Training error = 15%

• Dev error = 30%

The training set performance is already close to the optimal error rate of 14%. Thus, there is
not much room for improvement in terms of bias or in terms of training set performance.
However, this algorithm is not generalizing well to the dev set; thus there is ample room for
improvement in the errors due to variance.

This example is similar to the third example from the previous chapter, which also had a
training error of 15% and dev error of 30%. If the optimal error rate is ~0%, then a training
error of 15% leaves much room for improvement. This suggests bias-reducing changes might
be fruitful. But if the optimal error rate is 14%, then the same training set performance tells
us that there’s little room for improvement in the classifier’s bias.

For problems where the optimal error rate is far from zero, here’s a more detailed
breakdown of an algorithm’s error. Continuing with our speech recognition example above,
the total dev set error of 30% can be broken down as follows (a similar analysis can be
applied to the test set error):

• Optimal error rate (“unavoidable bias”): 14%. Suppose we decide that, even with the

best possible speech system in the world, we would still suffer 14% error. We can think of
this as the “unavoidable” part of a learning algorithm’s bias.

Page 46 Machine Learning Yearning-Draft Andrew Ng

41 42 43 44 45 46 47 48 49 50 51