Page 46 -
P. 46

22 Comparing to the optimal error rate




             In our cat recognition example, the “ideal” error rate—that is, one achievable by an “optimal”
             classifier—is nearly 0%. A human looking at a picture would be able to recognize if it
             contains a cat almost all the time; thus, we can hope for a machine that would do just as well.

             Other problems are harder. For example, suppose that you are building a speech recognition

             system, and find that 14% of the audio clips have so much background noise or are so
             unintelligible that even a human cannot recognize what was said. In this case, even the most
             “optimal” speech recognition system might have error around 14%.

             Suppose that on this speech recognition problem, your algorithm achieves:


             • Training error = 15%

             • Dev error = 30%

             The training set performance is already close to the optimal error rate of 14%. Thus, there is
             not much room for improvement in terms of bias or in terms of training set performance.
             However, this algorithm is not generalizing well to the dev set; thus there is ample room for
             improvement in the errors due to variance.


             This example is similar to the third example from the previous chapter, which also had a
             training error of 15% and dev error of 30%. If the optimal error rate is ~0%, then a training
             error of 15% leaves much room for improvement. This suggests bias-reducing changes might
             be fruitful. But if the optimal error rate is 14%, then the same training set performance tells
             us that there’s little room for improvement in the classifier’s bias.


             For problems where the optimal error rate is far from zero, here​’​s a more detailed
             breakdown of an algorithm​’​s error. Continuing with our speech recognition example above,
             the total dev set error of 30% can be broken down as follows (a similar analysis can be
             applied to the test set error):

             • Optimal error rate (“unavoidable bias”)​: 14%. Suppose we decide that, even with the

               best possible speech system in the world, we would still suffer 14% error. We can think of
               this as the “unavoidable” part of a learning algorithm​’​s bias.











             Page 46                            Machine Learning Yearning-Draft                       Andrew Ng
   41   42   43   44   45   46   47   48   49   50   51